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Abstract. We extend first-order logic with counting by a new operator that allows it to 
formalise a limited form of recursion which can be evaluated in logarithmic space. The 
resulting logic LREC has a data complexity in LOGSPACE, and it defines LOGSPACE- 
complete problems like deterministic reachability and Boolean formula evaluation. We 
prove that LREC is strictly more expressive than deterministic transitive closure logic with 
counting and incomparable in expressive power with symmetric transitive closure logic 
STC and transitive closure logic (with or without counting). LREC is strictly contained in 
fixed-point logic with counting FP+C. We also study an extension LREC = of LREC that 
has nicer closure properties and is more expressive than both LREC and STC, but is still 
contained in FP+C and has a data complexity in LOGSPACE. 

Our main results are that LREC captures LOGSPACE on the class of directed trees and 
that LREC= captures LOGSPACE on the class of interval graphs. 



1. Introduction 

Descriptive complexity theory gives logical characterisations for most of the standard com- 
plexity classes. For example, Fagin's Theorem [7] states that a property of finite structures 
is decidable in NP if and only if it is definable in existential second-order logic £}. More 
concisely, we say that T,\ captures NP. Similarly, Immerman [T3] and Vardi [2S] proved that 
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fixed-point logic FP captures PTIME0 and Immerman [15] proved that deterministic tran- 
sitive closure logic DTC captures LOGSPACE. However, these and all other known logical 
characterisations of PTIME and LOGSPACE and all other complexity classes below NP have 
a serious drawback — they only hold on ordered structures. (An ordered structure is a 
structure that has a distinguished binary relation which is a linear order of the elements of 
the structure.) The question of whether there are logical characterisations of these complex- 
ity classes on arbitrary, not necessarily ordered structures, is viewed as the most important 
open problem in descriptive complexity theory. For the class PTIME this problem goes back 
to Chandra and Harel's fundamental article [I] on query languages for relational databases. 

For PTIME, at least partial positive results are known. The strongest of these say that 
fixed-point logic with counting FP+C captures PTIME on all classes of graphs with excluded 
minors [TTJ and on the class of interval graphs [TS]. It is well-known that fixed-point logic 
FP (without counting) is too weak to capture PTIME on any natural class of structures that 
are not ordered. The idea that the extension FP+C by counting operators might remedy the 
weakness of FP goes back to Immerman |14j . Together with Lander he proved that FP+C 
captures PTIME on the class of trees |17| . Later, Cai, Fiirer, and Immerman [3] proved that 
FP+C does not capture PTIME on all finite structures. 

Much less is known for LOGSPACE. In view of the results described so far, an obvious 
idea is to try to capture LOGSPACE with the extension DTC+C of deterministic transitive 
closure logic DTC by counting operators. However, Etessami and Immerman [6] proved 
that (directed) tree isomorphism is not definable in DTC+C, not even in the stronger tran- 
sitive closure logic with counting TC+C. Since Lindell [23] proved that tree isomorphism is 
decidable in LOGSPACE, this shows that DTC+C does not capture LOGSPACE. 

We introduce a new logic LREC and prove that it captures LOGSPACE on directed trees. 
An extension LREC= captures LOGSPACE on the class of interval graphs (and on the class 
of undirected trees). The logic LREC is an extension of first-order logic with counting by 
a "limited recursion operator". The logic is more complicated than the transitive closure 
and fixed-point logics commonly studied in descriptive complexity, and it may look rather 
artificial at first sight. To explain the motivation for this logic, recall that fixed-point logics 
may be viewed as extensions of first-order logic by fixed-point operators that allow it to 
formalise recursive definitions in the logics. LREC is based on an analysis of the amount 
of recursion allowed in logarithmic space computations. The idea of the limited recursion 
operator is to control the depth of the recursion by a "resource term", thereby making sure 
that we can evaluate the recursive definition in logarithmic space. Another way to arrive 
at the logic is based on an analysis of the classes of Boolean circuits that can be evaluated 
in LOGSPACE. We will take this route when we introduce the logic in Section [3J 

LREC is easily seen to be (semantically) contained in FP+C. We show that LREC con- 
tains DTC+C, and as LREC captures LOGSPACE on directed trees, this containment is strict. 
Moreover, LREC is not contained in TC+C. Then we prove that undirected graph reachabil- 
ity is not definable in LREC. Hence LREC does not contain transitive closure logic TC, not 
even in its symmetric variant STC, and therefore LREC is strictly contained in FP+C. 

It can be argued that our proof of the inability of LREC to express graph reachabil- 
ity reveals a weakness in our definition of the logic rather than a weakness of the limited 

^More precisely, Immerman and Vardi's theorem holds for least fixed-point logic and the equally expressive 
inflationary fixed-point logic. Our indeterminate FP refers to either of the two logics. For the counting 
extension FP+C considered below, it is most convenient to use an inflationary fixed-point operator. See any 
of the textbooks [5J \M \M for details. 
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recursion operator underlying the logic: LREC is not closed under (first-order) logical reduc- 
tions. To remedy this weakness, we introduce an extension LREC= of LREC. It turns out 
that undirected graph reachability is definable in LREC= (this is a convenient side effect of 
the definition and not a deep result). Thus LREC= strictly contains symmetric transitive 
closure logic with counting. We prove that LREC= captures LOGSPACE on the class of 
interval graphs. To complete the picture, we prove that plain LREC, even if extended by a 
symmetric transitive closure operator, does not capture LOGSPACE on the class of interval 
graphs. 

The paper is organised as follows: After giving the necessary preliminaries in Section [21 
in Section[3]we introduce the logic LREC and prove that its data complexity is in LOGSPACE. 
Then in Section [H we prove that directed tree isomorphism and canonisation are definable 
in LREC. As a consequence, LREC captures LOGSPACE on directed trees. In Section El we 
study the expressive power of LREC and prove that undirected graph reachability is not 
definable in LREC. The extension LREC= is introduced in Section Finally, our results 
on interval graphs are presented in Section We close with a few concluding remarks and 
open problems. 

2. Basic Definitions 

N denotes the set of all non-negative integers. For all m, n £ N, we let [m, n] := {p £ N | 
m < p < n} and [n] := [1, n]. Mappings / : A — > B are extended to tuples a = (a±, . . . , a^) 
over A via /(a) := (/(ai), . . . , /(a^)). Given a tuple a = (a±, . . . , a^), let a := {oi, . . . , a^}. 
If ~ is an equivalence relation on a set A, we denote by a/~ the equivalence class of an 
element a with respect to ~, and by Aj ^ the quotient of A with respect to ~. 

A vocabulary is a finite set r of relation symbols, where each Ret has a fixed arity 
ar(i?). A r-structure A consists of a non-empty finite set V(A), its universe, and for each 
R £ r a relation R(A) C V(A) ax ^ R \ For logics L, L' we write L < L' if L is semantically 
contained in L , and L < L' if this containment is strict. 

All logics considered in this paper are extensions of first- order logic with counting 
(FO+C); see, e.g., [SI [TD1 Q51 [221 ES] for a detailed discussion of FO+C and its extensions. 
FO+C extends first-order logic by a counting operator that allows for counting the cardi- 
nality of FO+C-definable relations. It lives in a two-sorted context, where structures A 
are equipped with a number sort N(A) := [0, |V(.A)|]. FO+C- variables are either structure 
variables that range over the universe of a structure, or number variables that range over 
the number sort. For each variable u, let A u := V(A) if u is a structure variable, and 
A u := N(A) if it is a number variable. Tuples (u\, . . . , Uk) and (v\, . . . , vg) of variables are 
compatible if k = £, and for every i £ [k] the variables and Vi have the same type. Let 
^(«i,...,ufe) ._ j^ui x ... x A Uk . An assignment in A is a mapping a from the set of variables 
to V(A) U N(A), where for each variable u we have a(u) £ A u . For tuples u = (ux, . . . , Uf.) 
of variables and a = (a%, . . . , a^) £ A", the assignment a [a/it] maps U{ to for each i £ [k], 
and each variable v u to a{y). 

FO+C is obtained by extending first-order logic with the following formula formation 
rules: p < q is a formula for all number variables p,q; and #uip = p is a formula for all 
tuples u of variables, all tuples p of number variables, and all formulae if). Free variables 
are defined in the obvious way, with free(#it t/j = p) := (free(V') \ u) U p. Formulae #it ip = p 
hold in a structure A under an assignment a in A if \{a £ A u \ (A, a[a/u]) \= = {a(p)) A , 
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where for tuples n = {n\, . . . , rif.) E N(A) we let (n) A be the number 

(n) A ■■= E^-dn^i + ir 1 - 
i=i 

If A is understood from the context, we write (n) instead of {n) A . 

We write (p(u\, . . . ,Uk) to denote a formula ip with free(</?) C {ux, . . . ,Uk}- Given a 
formula <p(u\, . . . , life), a structure ^4 and ai, . . . ,a& € A^ Ul >-> u k) y we write A \= <p[ai, . . . , a&] 
if </> holds in ^4 with Uj assigned to the element a^, for each i E [fe]. We use similar notation 
for substitution: For a tuple (v±, . . . , Vk) of variables that is compatible with (m, . . . , life), we 
let tp(vi, . . . , t>fe) be the result of substituting Vi for Uj for every £ e [&]. We write </?L4, a; u] 
for the set of all tuples a £ A u with (A, a [a /it]) \= <p. 

In many places throughout this paper we refer to various transitive closure and fixed- 
point logics (all mentioned in the introduction). Our results and remarks about the relation 
between these logics and our new logics LREC and LREC= are relevant for a reader familiar 
with descriptive complexity theory to put our results in context, but they are not essential 
to follow the technical core of this paper. Therefore, we omit the definitions and refer the 
reader to the textbooks [SJ QUI EH E2] and the paper [IB] . 



3. The Logic LREC 



In this section, we introduce LREC as a first step towards the logic LREC=, to be intro- 
duced in Sectional LREC is already expressive enough to capture LOGSPACE on directed 
trees, but still lacks several important properties. For example, it is unable to capture 
LOGSPACE on undirected trees and interval graphs (cf. Remark I7.15p . and is not closed 
under first-order reductions (Section E]). On the other hand, although LREC= could have 
been introduced without the detour via LREC, its definition is much easier to grasp by 
developing an understanding of LREC first. 

Let us start our development of LREC by looking at how certain kinds of Boolean 
circuits can be evaluated in LOGSPACE. 

The figure on the right shows a Boolean formula, i.e., a 
Boolean circuit whose underlying graph is a tree. It is easy to 
evaluate such circuits in LOGSPACE: Start at the output node, 
determine the value of the first child recursively, then determine 
the value of the second child, and so on. We only have to store 
the current node and its value (if it has been determined al- 
ready), since the parent node and the next child of the parent 
(if any) are uniquely determined by the current node. It is known 
that Boolean formula evaluation is complete for LOGSPACE un- 
der NC 1 -reductions [T]@ In contrast, Boolean circuit evaluation is PTIME-complete. 

Let us now turn to formulae with threshold gates, which, 
'>~2"\ in addition to Boolean gates, may contain gates of the form 

"> i" for a number i; such a gate outputs 1 if, and only if, at 
least i input gates are set to 1. An example is shown on the left. 
To evaluate such formulae in LOGSPACE, we again start at the 

uation is only complete for LOGSPACE if input formulae are represented as graphs 
bs plus gate types). It was however shown in [5] that the problem is complete for 
put formulae are given by their natural string encoding. 





(e.g., by the list 
NC 1 under ATj^ 
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root and evaluate the values of the children recursively. For 
each node we count how many 1-values we have seen already. 
To this end, when evaluating the values of the children of a 
node v, we begin with the child with the largest subtree and 
proceed to children with smaller subtrees. Note that the ith 

child of v in this order has a subtree of size at most s/i, where s is the size of the subtree of 

v. So, we can store a counter of up to log 2 i bits for the number of 1-values seen so far. It is 

easy to extend the algorithm to formulae with other arithmetic gates such as modulo- gates. 
As a more complicated example, let us consider the following 

type of circuit. A circuit C has the m-path property if for all paths 

P in C the product of the in-degrees of all but the first node on 

P is at most m. For example, formulae have the 1-path property, 

whereas the circuit on the right has the 16-path property. It is 

not hard to see that for every k > 1, circuits C having the \C\ k - 

path property can be evaluated in LOGSPACE. The idea here is 

very similar to the one for evaluating circuits with threshold gates. 

We start at the root node and evaluate the children recursively. 

After "entering" a node v from one of its parent nodes, say p(v), 

we check whether v evaluates to 1 by counting the number of 

children that evaluate to one using the above-mentioned strategy, 

and return with this information to p(v). In order to return to 

p(v), we need to remember p(v), which we do by storing the index 

of p{y) among all the in-neighbours of v. This requires only log 2 d~(v) bits of storage, where 

d~(v) denotes the in-degree of v. The space for writing down the index of the predecessor 

p(v) for each vertex v on the path from the root to the currently visited vertex is thus 

bounded by the sum of the logarithms of the in-degrees of the vertices v on that path. 

Since C has the |C| fc -path property, this sum is bounded by log 2 |C| fc , and thus logarithmic 

in the size of C. Another way of evaluating the circuit is to first "unravel" the circuit to a 

tree (i.e., a formula) which can be done in LOGSPACE due to the |C| fc -path property, and 

then to evaluate the formula as above. 

The logic LREC allows it to recursively define sets X of tuples based on graphs G that 

have the |G| fc -path property for some k > 1. 

We turn to the formal definition of the logic LREC. To define the syntax, let r be a 

vocabulary. The set of all LREC[r]-formulae is obtained by extending the formula formation 

rules of FO+C[r] by the following rule: If u, v, w are compatible tuples of variables, p, f are 

non-empty tuples of number variables, and ipz and ipc are LREC[r]-formulae, then 

ip := [\rec u ,v,p fE, (pc]{w,r) (3.1) 

is an LREC [r] -formula, and we let free(y) := (free((/? E ) \ (uU v)) U (freeze) \ (uUp)) UtwUf. 

To define the semantics of LREC[r]-formulae, let A be a r-structure and a an assignment 
in A. The semantics of LREC[r]-formulae that are not of the form (|3.ip is defined as usual. 

Let ip be an LREC[r]-formula of the form (|3.ip . We define a set X C A u x N recursively 
as follows. We consider E := (p E [A,a;u,v] as the edge relation of a directed graph G with 
vertex set V := A u . Moreover, for each vertex a E V we think of the set C(a) := {(n) \ 
n G tpc [A, a[a/u];p]} of integers as the label of a. Let aE := {b G V | db € E} and 
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Figure 1: The graph G from Example 13, 11 Each vertex is labelled with a subset of [0, 11]. 



Eb := {a G V | ab G E}. Then, for all a G V and / eN, 

(a,£) G X I > and 







1 




j& G aE 


(•■ 


H 




|Eb| 



G C a . 



Notice that X contains only elements (a, £) with I > 0. Hence, the recursion eventually 
stops at £ = 0. We call X the relation defined by (p in (A, a). Finally, we let 

(A,a)^cp («(«;), (a(F))) GX 

Example 3.1 (Boolean circuit evaluation). Let a := {E, P A , Py , P^, Pq, P\}. A Boolean 
circuit C may be viewed as a cr-structure, where E(C) is the edge relation of C, and -P*(C) 
contains all *-gates for * G {A, V, -i, 0, 1}. Suppose C has the |C|-path-property. Then, 

<p(z) := 3n,r 2 ([[rec^^p cp E , ip c ](z, (ri,r 2 )) A Vr(r < n A r < r 2 )) 

with (p E (x,y) := E(x,y) and 

99 C (:r,p) := (P A (x) A #yE(x,y) = p) V (P v (x) A "p > 0") V (i^(z) A "p = 0") V Px(x) 

states that gate z evaluates to 1. 

For example, let C be the first circuit at the beginning of this section, and let a be the 
assignment in C mapping z to the root of C, r\ to 4, and r 2 to 0. Figure [T] shows the graph 
G = (V, E) with V := C x , E := v? E [C, a; x, y], and labels defined by ifc- The vertices a-k of G 
are precisely the vertices of C, and each vertex is labelled with a subset of N(C) = [0, 11]. 
Let X be the relation defined by [lrec X)3/iP (p?,, (fc](z, (fi, r 2 )) in (C,a). For a leaf v of G, 
we have (v, 1) G X (and, in fact, (v, I) £ X for any ^ > 0) if and only if occurs in the 
label of v. Hence, (v, 1) G X for v G {c, e, h,j, k}, but (/, 1) ^ X and (i, 1) ^ X. Since 
(e, 1) G X and 1 occurs in the label of 6, we also have (6, 2) G X; as for the leaves, we also 
have (6, f) el for any ^ > 2. However, note that (g, 2) ^ X (and, in fact, (5, £) £ X for all 
£ > 0), because there are only three children v of g with (u, 1) G X, but 3 does not appear 
in the label of g. Consequently, (d,3) G X. Since we now have (6,3) G X, (c, 3) G X, and 
(d, 3) G X, we have (a, 4) G X, and therefore (C, a) |= 93. 

While for the circuit C above, we could have replaced the tuple (n, r 2 ) in the formula </3 
by a single number variable r, it is not hard to construct circuits C which have the |C|-path 
property, but the single number variable r does not suffice. □ 

Example 3.2 (Deterministic transitive closure). Let G = (V,E) be a directed graph and 
a, b G V. Then there is a deterministic path from a to b in G if there exists a path v±, . . . , v n 
from a = v\ to b = v n in G such that for every i G [n — 1], Uj+i is the unique out-neighbour 
of V{. Figure 2(a) shows a directed graph with a deterministic path from c to d. 
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(a) A graph with a deterministic path from (b) The associated labelled graph defined 

c to d. by ip E and tpc- 

Figure 2: A graph with a deterministic path, and the labelled graph defined by the formulae 
ipz and (fc in Example 13,21 from that graph. 

Let ij)(u, v) be an L R EC [r] -formula, and let s, i be tuples of variables such that u, v, s, i 
are pairwise compatible. We devise a formula p(s, t) such that for any r-structure A and 
assignment a in A, we have (A, a) \= <p(s, t) iff in the graph G = (V, E) defined by V := A u 
and E := tp[A, a; u, v] there is a deterministic path from a(s) to a(t). Note that there is 
such a path precisely if, in the graph obtained from G by reversing the edges, there is a path 
v n , . . . , v% from a(t) to a(s) such that for every i G [n — 1], Vi + \ is the unique in-neighbour 
of V{. Therefore, we can choose (p like this: 

ip := 3f [\recv,ufi 93 E («,u), ^ c (w,p)](t, r), (3.2) 

where p and f are |u|-tuples of number variables, and 

iPe(v, u) := ip(u, v) A Vu (^(n, u') — >• u' = u), ipc(v,p) := « = sV(t)/sAj)^0). 

Informally, (/3e(^,"u) removes all edges ab of G, where a has more than one out-neighbour, 
and reverses the remaining edges. All that remains is to check whether there is a path from 
a(t) to a(s) in the graph defined by <^e- The node labelling formula <pc is chosen in such 
a way that the latter is true iff (a(t),£), for an I < \V\, appears in the relation X defined 



by <p in (^4, a). If, for example, G is the graph in Figure 2(a) , and if a(s) = c and a(t) = d, 
then the labelled graph defined by <ps and (fc is as shown in Figure |2(b)[ and it is easy to 
see that (d, 4) £ X, while, for example, (e,£) £ X for all £ > 0. □ 

As from now, we use 

[dtc fi , e ^](a,t) (3.3) 
as an abbreviation for the LR EC-formula in 



Remark 3.3. In the preceding two examples, the set X turned out to possess a certain 
monotonicity property: If (a,£) € X for some £, then (a,£') £ X for all £' > I. In general, 
however, the relation X defined by an Irec operator does not possess this property. For 
example, consider the formula ip := [\rec u>v>p E(u,v), u p = 0"](u,p). Now let G be the 
graph consisting of a single edge (a, b), and let a be the assignment mapping u to a and p 
to 2. Then the relation X defined by cp in (G, a) contains (a, 1), but not (a, 2). 

The following theorem shows that the data complexity of LREC is in LOGSPACE. 

Theorem 3.4. For every vocabulary t, and every LREC [r]-formula ip there is a determinis- 
tic logspace Turing machine that, given a r-structure A and an assignment a in A, decides 
whether (A, a) \= p. 
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Proof. We proceed by induction on the structure of ip. The case where <p is not of the form 
(|3.ip is easy. Let p be of the form (|3.ip . i.e., let 

<p = [\recu,v,p Pz, p c ](w,f). 

Let G = (V,E) be the graph with V = A u and E = p E [A,a;u,v], let C(o) := {(n) \ n G 
(fc[A, a[u/a];p]} for all a G V, and let X C V x N be the relation defined by <p in (A, a). We 
construct a deterministic logspace Turing machine that decides whether (a(w), (a(r))) G X. 

The machine is constructed in two steps. The first step consists of constructing a de- 
terministic logspace Turing machine Mi that, given A and a as input, computes a labelled 
directed tree T that is obtained basically from "unravelling" G starting at a(w) with "re- 
source" (a(r)}. The second step is to devise a deterministic logspace Turing machine M2 
that takes T as input and decides whether its root, (a(w), (a(f))), belongs to X. The 
composition of M\ and Mi finally yields the desired machine. 

Let k := \f\. We define a labelled directed tree T whose set W of vertices consists of 
all the sequences ((ao,£o), . . . , (a m ,£ m )) of pairs from V x N for some m G N such that 

(1) (d ,£ ) = (a(w), (a(F))), 

(2) Oj+i G OjE for all i < m, and 

( 3 ) 4+1 = [^fej for a11 * < m - 

There is an edge from ((a ,4), • • • > (4,4)) to ((aQ,f ), . . . , (a^,,^,)) in T if m' = m+ 1, 
and (o^,^) = (ai,£i) for all i < m. We label each vertex t> = ((ao,A))> ■ • • ; (^m>^m)) £ W 
with the set C(t>) := C(a m ), and with the number fail(v) G {0, 1} such that fail(v) = 1 iff 
£ m = 0. Note that fail(v) = 1 only if v is a leaf in T. Clearly, T is a labelled directed tree 
rooted at (a(w), (a(f)}). 
Define Y CW such that 

v (zY <^=> I {if G Y" I w is a child of v}\ G C(u) and fail(v) = (for every u G W). 

Claim 1. For every v = ((ao, £0), ■ ■ ■ , (a m ,£ m )) £ W we have i; G V if and only if (a m ,£ m ) G 
X. In particular, (a(w), (a(f)}) G X if and only if (a(w), (a(f)}) G Y\ 

Proof. The proof is by induction on the rank r v of v in T: if u is a leaf in T, then r v = 0; 
and if v is not a leaf in T, then is one more than the maximum of the ranks of v's children. 
For every v = ((a ,£ ), ■■■ , (a m ,£ m )) G W, let \(v) := (a m ,£ m ). 

Suppose that r v = 0, that is, v is a leaf in T. Consider (a,£) = X(v). Then aE is the 
empty set or £ = 0. First consider the case that £ = 0. In this case, (a,£) ^ X by the 
definition of X. But we also have fail(v) = 1, which implies v £Y . Next consider the case 
that aE is the empty set and £ > 0. In this case, 

v G Y G CO) = C(a) (a,£) G X, 

as desired. 

Suppose now that r v = r + 1, and that the claim is true for vertices w with r w < r. In 
particular, since v is not a leaf we must have fail(v) = 0. This implies £ > 0, and 

!) £7 <^=> I {it; G Y" I iu is a child of v}\ G C(u) 

<^=^> |{A(iy) G X I ic is a child of u}| G C(w) by the induction hypothesis. (3.4) 

Let W be the set of all children w of v such that A(ty) G X, and let /: W — > A u be such 
that for all w G W, f(w) is the first component of X(w). Then / is a bijection from W to 
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the set of all tuples b G aE with 



b 



e-i 



G X. 



(3.5) 



Eft | 



As a consequence, the number of all tuples b G aE with (|3.5|) is precisely [W'|. Hence, by 
Q52D and £ > 0, 



By Claim[H it suffices to compute T, and use T to decide whether its root, (a(w), (a(r))), 
belongs to Y. This is precisely what the two machines Mi and Mi mentioned at the begin- 
ning of this proof do. We now prove the existence of such machines. 

Claim 2. There is a deterministic logspace Turing machine that takes A and a as input 
and outputs T. 

Proof. We first construct a deterministic logspace Turing machine M that takes A and a 
as input and outputs the vertices of T (represented as sequences ((ao,£o), . . . , (a m ,£ m )) as 
above). This machine makes use of a deterministic logspace Turing machine M E that takes 
A, a and a pair (a, b) G V 2 as input and decides whether ab G E. Such a machine exists by 
the induction hypothesis. Once M is constructed, we can easily compute the edges and the 
labels of T, using a deterministic logspace Turing machine for computing the labels C(a) for 
each a £ V as guaranteed by the induction hypothesis. 

In what follows, we describe how M computes the vertices of T from A and a. We 
basically do a depth-first search in G starting in a(w) with "resources" (a(f)). In each 
step, we visit some vertex a G V. We also maintain a number £ < \N(A)\ k , the length m 
of the path P = (ao, • • • ,fl m ) on which a was reached from a{w), and for each i G [m] a 
number ej € [0, |Eaj| — 1] with the following property. For each b G ^4" let bo, . . . ,b p be 
the elements of Eft ordered lexicographically according to their representation in the input 
string; let pre(6, i) ■= b%. Then the number e« will have the property that di-i = pre(aj,ej). 
When we move from a to some vertex b G aE we update I to be 



This ensures that the space needed to store the numbers e\, . . . ,e m is logarithmic in \W\ 
(which we shall prove later). Finally, upon visiting a for the first time, we write the sequence 
a m-,£m) to the output tape, where the t\ are the values for I maintained along 

the path P. 

More precisely, we proceed as follows. In the first step, we let a := a(w), £ := {a(f)) and 
m := 0. Let a G V, £ < \N(A)\ k , m G N and numbers e\, . . . , e m be given. Furthermore, let 
oq, . . . , a m be such that a m = a, and for each i G [m], = pre(aj, ej); and let £q, . . . ,£ m 
be such that £$ = (a(f)) and for each i G [m], £% = decr(£j_i, Oj). Notice that each of the 
<2j and £j can be computed in logarithmic space given a, m, e\, . . . , e m and i as input. Let 
< be some fixed ordering on aE. There are now two possible cases: 

(1) m was increased in the last move, or there was no last move. This corresponds 
to a first visit of the vertex a with £ on the current path. Therefore we write the 
sequence (ao,£o), . . . , (a m ,£ m ) to the output tape. We then let j := be the index 
of the child of a to be visited next. 




□ 



decr(£, b) : 



£-1 



Eb\ 
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(2) m was decreased in the last move. This corresponds to a return from a child b of 
a. Therefore, we do not write anything to the output tape. Let b be the vertex 
visited in the last step, let f be its rank in aE with respect to ■< (i.e., the number 
of elements in aE that precede b with respect to and let j := j' + 1. 
If I > and j < |aE| — 1, we update a to be the element of rank j in aE with respect to X; we 
also update I to be decr(£, a), increase m by one, and let e m be such that a m — pre(a, e m ). 
Otherwise, if I = or j = |aE|, we do the following. If m = 0, we stop; and if m > we 
update a to be a m -i, set I to £ m -i, and decrease m by one. It is not hard to see that this 
procedure outputs all the vertices of T. 

Maintaining the vertex a £ V and the vertex from the respective last step needs space 
0(log|V(4)|). Notice that 

io = (a(r)> < (|y(^)| + l) fc -l. 
Since ti = decr(£j_i, a«) for every i € [m], this implies 

m < £ < (\V(A)\ + l) k and [J |Eoj| < ^ < (|V(,4)| + l) fc . (3.6) 

i=i £m 

In particular, m together with a bit indicating whether m was increased or decreased in the 
last move can be maintained in space 0(log |F(^4)|). Furthermore, each of the numbers 
needs space rji := [~log 2 |Eaj|] . Let Z be the set of all i G [m] with |Eaj| > 2. By (|3.6|) we 
have \1\ < log 2 (|V(A)| + l) k . Hence, 

m E2J 

^77i = ^riog 2 |Ea,n < m+log 2 niE«il < 21og 2 (|l/(^)| + l) fc . 

i=l iGX iai 

In particular, we can store e\, . . . , e m as a single number e with r\ := 2 log 2 (|y(74)| + l) fc — 1) 
bits, reserving 77, bits in e for the number e^. To extract from e, we start by computing 
r\ m from a = d m , let e m be the number represented by the last r\ m bits of e, and let a m _i := 
pre(a m , e m ). We then compute r/ m _i from a m _i, let e m _i be the number corresponding to 
bit rj m -i to f] — Tj m of e, and let a m _ 2 := pre(a m _i, e m _i). We continue this way until is 
found. □ 

Claim 3. There is a deterministic logspace Turing machine that takes T as input and 
decides whether the root (a(w), (a(f))) of T belongs to Y. 

Proof. Let vq := (a(w), (a(r))). On input T, a deterministic logspace Turing machine can 
decide whether v € Y as follows. The idea is to visit the vertices in a depth-first fashion, 
starting in vq, and count, for each node that is visited, the number of children that belong 
to Y. To implement this in logarithmic space, we proceed in steps as follows. 

In each step, we are in a vertex v of T, which is v$ in the first step. With each vertex 
Vi on the path vq, v\, . . . , v m from vq to v we associate 2 • £ v (i) bits of memory for counters 
t(i),c(i) from to 2 iv ^ — 1, where £ v (i) will be specified below. The counter t(i) simply 
counts the number of children of Vi that have already been processed (excluding the vertex 
in whose subtree we are currently in), while c(i) counts the number of children of V{ that 
have already been processed and belong to Y . We guarantee that the sum of the numbers 
2 ■ £ v (i) over i G [0, m] is bounded by 6 • log 2 |VF|. Moreover, it will be easy to determine 
£ v (i) from v and i in logspace; so we can store the counters in a bit string of length at most 
6-log 2 |W / |, and identify the bits that belong to t(i) and c(i) from that bit string in logspace, 
given v and i. By visiting the children of each vertex in decreasing order of the number of 
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vertices in the children's subtrees, we ensure that there is always enough space to keep the 
counters in memory until all children have been processed. 

We now give a more detailed description of a single step. In the initial step, we set 
v := vq and t(0) := c(0) := 0. For the other steps, we need the following definitions: 

• The size s(v) of a vertex v £ W is the number of vertices in the subtree of T rooted 
at v. It is easy to compute this number in logarithmic space: all we need to do is 
to initialise a counter, iterate over all vertices of T, and for each such vertex move 
upwards and increment the counter by 1 if v is reached. 

• Let v G W, and let wi, . . . , uu p be the children of v such that s{w\) > s{u)2) > • • • > 
s(w p ); children of the same size are ordered in lexicographic order based on their 
representation in the input string. For every j E \p], let child(f,j) := Wj. The 
vertex child(-u, j) is easy to compute in logarithmic space, given v and j. 

• Let v € W, let Vo, v%, . . . , v m be the path from vq to v, and let i E [0, m]. Then 

I U\ ._ I r io S2 il> if £ < m and child^j, j) = v i+1 , 
\\log 2 \W\], iii = m. 

This number is easy to compute in logspace given v and i as input. 
Suppose that v is the current vertex, and that vq,Vi, . . . ,v m is the path from vq to v. If 
t(m) is smaller than the number of children of v, then we set v := child(v, t{m) + 1) and 
t(m + 1) := c(m + 1) := 0, and continue with the next step. Otherwise, we check whether 
c(m) € C(v) and fail{v) = 0. If this is the case, we say that v succeeds. In any case, whether 
v succeeds or not, we do the following: 

(1) If m = 0, then we accept T iff v succeeds. 

(2) If m > 0, then we increase t(m — 1) by one, and if v succeeds we also increase 
c(m — 1) by one. Afterwards, we let v be the parent of v, and continue with the 
next step. Note that with the updated v, 2£ v (m — 1) bits suffice to store t(m — 1) 
and c(m — 1). 

It should be clear that this procedure correctly decides whether vq G Y. 

Concerning the space for the counters, let jo, ji, • • • , jm-i be such that child(fj, ji) = 
Vj+i for every i < m. Then 

E = E riog 2 iil < E (! + lo §2 Ji) = \{i<m\ j t > 2}| + log 2 J] j t . (3.7) 

i<m i<m i<m i<m 

Now observe that 

s(v i+ i) < ^-^ for every i € [0,m- 1]. (3.8) 

Ji 

To see this, consider Wj := child(uj, j) for every j < ji. By the choice of child(-, •), we have 

s(wi) >■■■> s(uij i ). Hence, if s(uij i ) = s(v i+ i) > s(vi)/ji, then s(iui)H hs(?%) > s(vi), 

which is impossible. As a consequence of (|3.8|) . we have 

\{i < m | ji > 2}| < log 2 |W| and J] * ^ II 4^ = < \W\. (3.9) 

i<m i<m S W+lJ s V v m) 

Altogether, this yields 

„ E3J ^ El 

E^« < \{i <m\j l >2}\+ log 2 J[ ji + log 2 |W| + 1 < 31og 2 |W| + l, 

j<m i<m 



12 



M. GROHE ET AL. 



which implies J2i<m 

(i) < 31og 2 |W|, and therefore £ i<m 24(i) < 61og 2 |W|. □ 

Altogether, this concludes the proof of Theorem 13.41 □ 

Remark 3.5. It follows from Example 1 3 . 2 1 1 hat DTC+C < LREC. This containment is strict 
as directed tree isomorphism is definable in LREC (we will show this in the next section), 
but not in DTC+C. On the other hand, it is easy to see that the relation X defined by 
an LREC-formula of the form (|3.1|) in an interpretation (A, a) can be defined in fixed point 
logic with counting FP+C. Hence, LREC < FP+C, and this containment is strict since we 
show in Section O that undirected graph reachability is not LR EC-definable. 



4. Capturing Logspace on Directed Trees 

In this section we show that LREC captures LOGSPACE on the class of all directed trees. 
Our construction is based on Linden's LOGSPACE tree canonisation algorithm [23]. Note, 
however, that Linden's algorithm makes essential use of a linear order on the tree's vertices 
that is given implicitly by the encoding of the tree. Here we do not have such a linear order, 
so we cannot directly translate Linden's algorithm to an LREC-formula. We show that we 
can circumvent using the linear order if we have a formula for directed tree isomorphism. 
Hence, our first task is to construct such a formula. 

4.1. Directed Tree Isomorphism. Let T be a directed tree. For every v G V(T) let T v 
be the subtree of T rooted at v, let size(u) := [V(T„)| be the size of v, and let be the 

number of children of v of size s. We construct an LREC[{-E}]-formula (p^(x, y) that is true 
in a directed tree T with interpretations v,w G V(T) for x,y if and only if T v = T w . We 
assume that |V(T)| > 4, but it is easy to adapt the construction to directed trees with less 
than 4 vertices. 

We implement the following recursive procedure to check whether T v — T w ; 

(1) If size(f) 7^ size(u)) or if # s (v) ^ # s ( w ) f° r some s G [0, |V(T„)| — 1], then return 

(2) If for all children v of v there is a child w of w and a number k such that 

(a) T{j = T^, 

(b) there are exactly k children w of w with Xg = T^, and 

(c) there are exactly k children v of v with T# = T^, 
then return U T V ^ T w ". 

(3) Return a T v ^ T w ". 

Clearly, this procedure outputs U T V = T w " if and only if T v = T W . 

To simplify the presentation we fix a directed tree T and an assignment a in T, but the 
construction will be uniform in T and a. 

We construct a directed graph G = (V, E) with labels C(v) C N for each v G V as follows. 
Let V := N(T) x V(T) 4 x N(T). The first component of each vertex is its type; the meaning 
of the other components will become clear soon. Although G will not be a tree, it is helpful 
to think of it as a decision tree for deciding T v = T w . For each pair (v,w) G V(T) 2 , we 
designate the vertex a VjW = (0, v, w, v, w, 0) to stand for "T v = T w n . Let us call (v, w) easy if 
v, w satisfy the condition in line 1 of the procedure (i.e., size(u) ^ size(ui), or # s (v) ^ H 1 s {w) 
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a>v,w n = # children of v 

X 



a„ 



X 



^v,w,v,w,k 



n = k 



7° 

*v,w.v,w.k 



xx~ 



X 



n > 

n = 1 if # s j zc ({;) (w) = 1; n = 3 otherwise 
n = A; 



,i 

v,w,v.w.k 



XX" 



Figure 3: Sketch of "decision tree" for deciding Tt, =T W . Here, range over the children 
of v; w,w range over the children of w; and k € [# s ize({))( t; )]- Moreover, v,v,w,w 
all have the same size. Labels indicate which integers n belong to the set C(a) 
labelling each vertex o. If v is the only child of v of size size('O), then a^tf, is the 
only child of a VtW ^^,k- 



for some s G [0, |V(T„)| — 1]). Note that the set of all such easy pairs is LREC-definableU If 
(v, w) is easy, then a v>w has no outgoing edges and C(a V:W ) = 0. On the other hand, if (v, w) 
is not easy, then G contains the following edges and labels (see Figure El for an illustration): 

• The vertex a V;W has an outgoing edge to a VjV] ^ := (l,v,w,v,w, 0), for each child 
v of v. Furthermore, C(a V:W ) = of children of v}. This corresponds to "for all 
children v of v. . . " in the above procedure's step 2. 

• The vertex a V)W> $ has an outgoing edge to a v ^ w $^,k := (2, v, w, v, w, k), for each child 
w of w with size(u)) = size(f)) and each k E [# s ize(c)( t ')]- Furthermore, C(a V)W ^) = 
N(T) \ {0}. This branching corresponds to ". . . there is a child w of w and a number 
k such that. . . ". 

• The vertex a v ^ w ^^,k has an outgoing edge to cio,^. If v is the only child of v of 
size size(-u), then this is the only outgoing edge, and we let C{a v ^ w ^^,k) = {!}• 
Otherwise, there are additional outgoing edges to a l v w fi ^ fc = (3+£, iu, w, k) for 
i G {0, 1}, and we let C(d ViW ^ : w,k) = {3}. This corresponds to conditions 2a-2c. 

• The vertex aP v w e ^ k has outgoing edges to for each child w of w of size size(£>), 
and a* w ~ ^ fc has outgoing edges to og^ for each child # of u of size size(u>) = size(£). 
Furthermore, C(a* w vw k) = "W- The vertex aj, ^ c ^ k corresponds to condition 2b 
for i = 0, and to 2c for i = 1. 

From the above description it should be easy to construct LREC[{i?}]-formulae </3 E (u, u') and 
<p c (u,p), where u = (q t ,x,y,x,y,q k ) and u' = (q' t , x' ,y' ,x' ,y' , q' k ), such that <^ E [T, a; n, n'] = 
E, and {(n) | n € v?c [T, a [a/u] ; p] } = C(a) for each a G V. 
Let 

P*i(x,y) := 3f [\recy„u',p ¥>e, Vc]((0, x, y, x, y, 0), f), 
where f is a 5-tuple of number variables^ Let X be the relation defined by (p^ in (T, a). 
Then: 

^Using the dtc-operator (|3.3p from Example 13.21 we can construct an LREC[{_B}]-formula defining the 
descendant relation between vertices in a directed tree, and using this formula it is easy to determine the 
size and the number of children of size s of a vertex. 

4 We use as a constant, but clearly we can modify ip^ to a formula that does not use the constant 0. 
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Lemma 4.1. Let v,w £ V(T). 

(1) If (a VjW ,£) G X for some I G N, then T V ^T W . 

(2) IfT v = T w , then for all £ > size(u) 5 we have (a VtW ,£) G X. 

Proof. AdUk The proof is by induction on size(u). If size(v) = 1 and (a V;W ,£) G X, then 
(v,w) is not easy, which implies size(io) = 1 and hence T„ = T W . 

Now let size(u) = s + 1 for some s > 1. If (d„ iU >,£) € X, then (u, to) is not easy, implying 
size(w) = s + 1 and H=t(v) = #ti w ) f° r an £ € N. It is then easy to see that for all children 
v of v in T there is a child u> of w in T and a number fe G [1, # s ize(i)) v ] such that 



(a,-. 



G X for some £' G N, 



• there are exactly /c children w of w such that (a^^,£') G X for some 6 N, and 

• there are exactly & children £> of v such that (a^,^') G X for some G N. 

By the induction hypothesis, this corresponds to step 2 of the procedure given at the 
beginning of Section [4,11 and therefore implies T v =T W . 

Ad\^ The proof is by induction on size(u). Suppose that size(v) = 1 and T v = T w . Then 
size(w) = 1 which implies that (v, w) is not easy. Furthermore, as v has no children in T, 
we know that a V)W has no children in G and C(a VjW ) = {0}. Hence, (a v>w ,£) G X for all 
£>1 = size(u) 5 . 

Now suppose that size(v) = s + 1 for some s > 1, and T v = T w . First note that (v,w) 
is not easy. Let I > (s + l) 5 - We show that (a, v>Wt v,£ — 1) G X for all children v of «, which 
implies (d V:W ,£) G X. Let t> be a child of -u in T. Since T„ = T w , there is a child w of u; of 
size s' := size(u) and a number k G such that 

W J-v -LUll 

• there are exactly k children w of u; of size s' such that T# = , and 

• there are exactly k children v of v of size s' such that T# = T$. 
Pick such and k. 

Let us deal with the case = 1 first. In this case, is the only child of 

a v ,w,v,u!,k', moreover, a VjW ^^,k and d^ have exactly one incoming edge each. Since = 
Tyj and I — 3 > (s') 5 : the induction hypothesis implies (a^^,£ — 3) € X. Consequently 
(a«,to,«,^ - 1) e X. 

In the following we assume # s /(v) > 2. Let d := 3 • # s i(v) 2 . Note that all vertices in 
Figure except the type 0- vertices have exactly one incoming edge, and that the in-degree 
d! of a type 0-vertex a v i >w i, where v' , w' are children of v and w, respectively, of size s' is at 
most d, because it has incoming edges from 

• vertices au,iu,i;',w',fc> where v and w are the (unique) parents of v' and w', respectively, 
and k G [# s >(v)]; 

7° 

,1 

v,w,v,w' ,k ' 
Let f := [(£-4)/d\. Then 

d d d 3-# s ,(?j) 2 3-# s ,(i>) 2 

where for the second inequality we use (s + 1) 5 > s 5 + s 4 , for the third one we use # s i(v)-s' < 
s, and for the fourth one we use # s /(f) > 2. Hence, by the induction hypothesis we have: 

• (av,w, [(£ - 3)/d'J) G X (note that [(£ - 3)/d'J > £'). 



vertices a® wv i^ k , where v,w,k are as above and w is a child of w of size s'; and 
vertices a} - , k , where v,w,k are as above and v is a child of v of size s'. 
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• There are exactly k children w of w of size s' with (a^^, [_(£ — 4)/d'J) G X (note 
that [(£ - 4)/cTJ > f ), which implies (5° ttiM)fc) £ - 3) G X. 

• There are exactly k children v of v of size s' with (o^, — 4)/d'J) G X, which 
implies that (oJ )1lM)Afc , £ - 3) G X. 

It follows immediately that (ou,iu,t>,?Z),fc! £ — 2) G X, and therefore {a v ^ w fi,i — 1) G X. □ 

Corollary 4.2. Lei v,w £ V(T) 2 . Then, T (= yj^[t?,iw] (i7^d only ifT v — T^. 

Proof. T \= (pat[v,w] holds precisely when (a VtW , \N(T)\^ — 1) G X. Furthermore, |X(T)|l r l — 
1 > |y(T)| 5 > size(u) 5 . Therefore, by the preceding lemma, (a v>w , |X(T)|l r l — 1) G X is 
equivalent to T v = T w , and the claim follows. □ 

4.2. Defining an Order on Directed Trees. Linden's tree canonisation algorithm is 
based on a logspace-computable linear order on isomorphism classes of directed trees. We 
show that a slightly refined version of this order is LREC-definable. 

Let T be a directed tree. For each v G V(T) let tt(v) := (size(-u), #i(v ), . . . , # s i ze (i,)-i( w )) 
be the profile of Let < be the total preorder on V(T)0 where v -< w whenever 

(1) ir(v) < 7r(w) lexicographically, or 

(2) ir(v) = 7r(vu) and the following is true: Let v\, . . . , Vk and u>x, . . . ,Wk be the children 
of v and w, respectively, ordered such that V\ -< ■ ■ ■ -< Vf. and w\ -< ■ ■ ■ -< w^- Then 
there is an i G [k] with Vi -< vii, and for all j < i we have vj ■< Wj and Wj X Vj. 

Note that v X w and w X v imply T v = T W . We show that X is LREC-definable. 

To simplify the presentation, we again fix a directed tree T and an assignment a, and 
we assume that |V(T)| > 4. 

We apply the Irec-operator to the following graph G = (V, E) with labels C(v) C N 
for each v G V. Let V := N(T) x y(T) 4 x N(T). For each (v,w) G V(T) 2 , the vertex 
a v ,w = (0, v,w,v, w, 0) represents "w -< w". Htt(v) < tt(w), then a VfW has no outgoing edges 
and C(a v>w ) = {0}. If ir(v) > n(w), then a„ jU; has no outgoing edges and C(a V)W ) = 0. Note 
that the relation u tt(v) < 7r(w;)" is LREC-definable. 

Suppose that n(v) = ir(w). For all t,u G V(T) let be the number of children u' of 
u with T u / = Tf. Call a child u of v good if ^(u) > 6 w (v) and for all children v' of u with 
size(V) < size(v) we have v (v') = 6 w (v'). Then it is not hard to see that v ~< w precisely if 
there is a good child v of v, a child w of w of size s := size(v) and a A; G [# s (u)] such that: 

• v ~< w; 

• there are exactly k children w of w of size s with it; -< v; 

• there are exactly k children v of u of size s with v ~< w and T# ^ T^; 

• and for all children u/ of w of size s with w' ~< v we have 9 v (w') = 9 w (w'). 
The "decision tree" in Figure H] checks precisely these conditions. 

Using the formula <p^ from the previous section it is now straightforward to construct 
LREC[{£^}]-formulae ipz(u,u') and ipc(u,p) that define the edge relation E of G and the sets 
C(a) for each a G V, where u and v! are as in the definition of ipg^. Let 

<P<(x,y) 3f [Irec^p <^e, ^ c ]((0, x, y, x, y, 0), f), 

^Lindell's order can be obtained by replacing tt(v) with 7r'(u) := (size(v), ^children of «). 
^That is, X is a preorder on V(T) such that for all v, w 6 V(T) we have u ^ ui or w X w. 
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(1, v, w,v,w, k) 


n = 1 if # sizc{i ,)(v) 


= 1; n = 4 otherwise 




(2, v, w,v,w, k) 


n = k 


(3, v, w,v,w, k) 


n = k 


(4, v, w,v,w, k) 


n = k 




* 








i 




i 























Figure 4: Gadget for deciding v ~< w when ir(v) = ir(w). Here, v ranges over good children 
of v; v ranges over children of v of size s := size(v) and T$ ^ T#; w, w range over 
children of w of size s; w' ranges over children of w of size s with 9 v (w') = 9 w (w'); 
and k G [# s (f )]• The edges from (2, v, w, v, w, k) to (t, . . .) for t G {2, 3, 4} exist 
only if jf= a (v) > 1. Labels indicate which integers n belong to the set C(a) labelling 
each vertex a. 



where r is a 5-tuple of number variables. Let X be the relation defined by (p^ in (T, a). We 
then have: 

Lemma 4.3. Let v,w G V(T). 

(1) If (a v ^ w ,£) G X /or some £ G N, then v <w. 

(2) Ifv~<w, then for alii > size(f) 5 we have (a VtW ,£) G X. 

Proof. The proof is similar to the proof of Lemma 14.11 

j4<i[7J' The proof is by induction on size(u). Suppose size(v) = 1. If (a v>w ,£) G X, then 
tt(v) < ir(w). We cannot have ir(v) = n(w), since otherwise ^ C(a v>w ) (see Figured]), so 
that X would contain at least one tuple of the form ((l,v,w,v, ■,■),£— 1) with v a child 
of v. But such a tuple does not exist, since v has no children. It follows that ir(v) < n(w) 
which implies v -< w. 

Now let size(v) = s + 1 for some s > 1. If (a VyW ,£) G X, then as above we have 
tt(v) < tt(w). If ir(v) < n(w), we have v -< w. So, suppose that ir{v) = tv(w), that is, 
size(w) = s + 1 and # t (v) = # t (w) f° r an ^ G N. It is then easy to see that there is a good 
child v of v, & child u> of w of size s := size(-O), and a G [# s (i>)] such that 

• (av,wJ') G X for some £' G N, 

• there are exactly k children w of w of size s such that (a^a,^) G X for some £' G N, 

• there are exactly k children v of v of size s with T# ^= T% such that (ag^,^) G X for 
some f eN, and 

• all k children w' of w of size s with (a^^,£') G X for some £' G N satisfy 6 v (w') = 
w {w'). 

By the induction hypothesis, this means that 

• v -< w, 

• there are exactly k children w of w of size s such that w -< v, 

• there are exactly /c children v of t> of size s with T# ^= such that v -< w, and 

• all k children w' of w of size s with w' ~< v satisfy 9 v (w') = 6 w (w'). 
As pointed out in Section T4.21 this implies v -< w. 

Ad\^ The proof is by induction on size(u). If size(f) = 1 and v -< w, then ir(v) < ir(w). 
By the construction of G this immediately implies (a VjW , I) G X for all £ > 1 = size(?;) 5 . 
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Now suppose that size(u) = s+1 for some s > 1, and v ~i w. First note that ir(v) < ir(w). 
If ir(v) < tt(w), then (d V:W ,£) G X for all £ > 1, and in particular, for all £ > size(f) 5 . So, 
assume that 7r(t>) = ir(w). 

Since v ~< w, there is a good child £> of v, a child of w of size s' := size(t)) and a 
A; G [# s /(u)] such that 

• v -< it), 

• there are exactly children tu of w of size s' with w -< v, 

• there are exactly k children v of v of size s' with v ~< w and T# ^= Xg, and 

• for all children u/ of w of size s' with w' -< v we have 9 v (w') = 6 w (w'). 
Pick such -0, io and A;. 

If # s '(v) = 1, then a^ 5 u, is the only child of (1, v, uu, v, w, k), and (1, v, w, v, w, k) and 
a$^w each have exactly one incoming edge. Since v -< w and £ — 2 > (V) 5 , the induction 
hypothesis implies {di,^,£ — 2) G X, and consequently, (a,,,™,^) G X. 

In the following we assume # s r(v) > 2. Let (i := 4# s /(i>) 2 . Note that all vertices in 
Figure 0] except the type 0- vertices have exactly one incoming edge. The type 0- vertices 
a>v',w'i where v' , w' are children of v and w, respectively, of size s', have incoming edges from 

• vertices (1, v, w, v', w', k), where k G [# s '( v )]i 

• vertices (2, w, v, w', v" , k), where k is as above and v" is a child of v of size s'; 

• vertices (3, v, w, v", w', k), where k and v" is a good child of v; and 

• vertices (4, w, v, w' , v" , k), where k and v" is a child of v of size s'. 

Hence, the in-degree of at most d. For the type 0- vertices a w 'y, where v',w' are 

children of v and w, respectively, of size s' , this is symmetric. 
Let £ > (s + l) 5 and t := [{£ - 3)/dJ. Then 

where for the second inequality we use (s + 1) 5 > s 5 + s 4 , for the third one we use jj= s i(v)-s' < 
s, and for the fourth one we use # s /(i>) > 2 Hence, by the induction hypothesis we have: 

• {o-i>,wi [(£ — 2)/diJ) G X, where di < d is the in-degree of a^, 

• there are exactly k children w of w of size s' with (a«,^, |_(^ — 3)/^]) G X, where 
c?2 < d is the in-degree of the vertices a^c, 

• there are exactly k children v of v of size s' with T# ^ Xg and {a%^, |_(^ — 3) /^J ) G X, 
where c?3 < d is the in-degree of the vertices a% and 

• for all k children w' of w of size s' with (a^,/ #, — 3)/g?2_J) G X we have 9 v (w') = 
9 w {w'). 

It follows immediately that (a V)W ,£) G X. □ 
Corollary 4.4. Lei v,w G V(T). Then, T \= (p^[v,w] if and only if v -< w. 



4.3. Canonising Directed Trees. We now construct an LREC-formula j(p, q) such that 
for every directed tree T we have T = ([| V(T) |], 7[T;p, q]). Since DTC captures LOGSPACE 
on ordered structures [15j and a linear order is available on the number sort, we immediately 
obtain: 

Theorem 4.5. LREC captures LOGSPACE on the class of directed trees. 
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Since directed tree isomorphism is in LOGSPACE by Lindell's tree canonisation algo- 
rithm, but not TC+C-definable [BJ, we obtain: 

Corollary 4.6. LREC ^ TC+C on the class of all directed trees. 

We use 1-recursion to define a set X C V(T) x N(T) 2 (for simplicity, we omit the 
"resources" in the description) such that for every v G V(T) the set X v := {(m, n) G 
N(T) 2 | (v,m,n) G X} is the edge relation of an isomorphic copy ([|V(T„)|], X v ) of T v . 
Each vertex of T is numbered by its position in the preorder traversal sequence, e.g., the 
root is numbered 1, its first child V\ is numbered 2, its second child t>2 is numbered 2+size(wi), 
and so on. 

To apply the Irec operator, we define a graph G = (V, E) with labels C(v) C N for each 
v G V as follows. Let V := V(T) x N(T) 2 , where (v,m,n) G V stands for "(m,ri) G X V T\ 
If v is a leaf, then X v should be empty, so for all m, n G N(T) we let (v,m,n) have 
no outgoing edges and define C((v,m,n)) := 0. Suppose that v is not a leaf and w is a 
child of v. Let -D„, be the set of all children w' of v with w' -< w, and let e w be the 
number of children w' of t> with T w = T w >. For each i G [0, e^, — 1], the set X v will 
contain an edge from 1 to p W) i := 2 + J2w'eD w size(w') + i • size(w), and the edges in 
{(Pw,i — 1 + fn-,Pw,i — 1 + n) | (m,n) £ X w }. Hence we let (v,l,p w> i) have no outgoing edges 
and define C((v, l,p W} i)) ■= {0}. Furthermore, for all m, n G N(T) and all i < e w , we let 
a := (v,p Wt i — 1 + m,p Wj i — 1 + n) have an edge to (w,m,n) and define C(a) := {e w }. 

It is now easy to construct LREC-formulae (p E (xi, p\ ,p[, ^2^2,^2) and ^(^l^ijPii <?) 
that define the graph G and the labels C(-). Let 

r y{Pi,P2) ■= 3x3r("x is the root" A [Irec^^p^p/)^^^^)^ V9 E , tp c ]((x,pi,p 2 ),r)) . 

Noting that the in-degree of each vertex (v,m,n) is at most e v , it is straightforward to show 
that 7 defines an isomorphic copy of a directed tree: 

Lemma 4.7. Let X be the relation defined by 7 in T , let v G V(T) and let X v := {(m, n) | 
((v,m,n),£) G X for some I > size(»}. Then T v ^ ([\V(T V )\], X v ). 

Proof. The proof is by induction on size(v). Clearly, the lemma is true if size(v) = 1. 
Suppose that size(v) = s + 1. By the induction hypothesis, for each child w of v we have 
T W ^([\V(T W )\],X W ). 

Let £ > size(v). Since for all children w of v and all m, n G N(T), the in-degree 
of (w,m,n) in G is at most e w and e w ■ size(w) < size(i>) (which implies [(£ — l)/e w \ > 
[(size(f) — lO/e^J > size(w)), 

{(j>w,i — 1 + m,Pw,i — 1 + m) I (m, n) G X w } C X„ for each child w of t; and i < e w . 

Furthermore, by construction, we have (l,p w< i) G X v for each child w of v and z < e w , and 
there are no more edges. It is easy to see that T v = ([\V(T V )\], X v ). □ 

Remark 4.8. The results of this section extend to coloured directed trees with a linear order 
on the colours. To be precise, consider a directed tree T and a total preorder < on V(T). 
Let ^ be as in Section 14.21 We define a refinement X' of ^ by letting v w whenever 
v < w, 01: v < w and w < v and v ~< w. It should be obvious how to modify (p^(x,y) 
to an LREC[{-E, <!}]-formula (p'^(x,y) defining -<'. Using this formula, we then obtain a 
formula -f'(p,q) such that (V(T),E(T), <) ^ ([|V(r)|], 7'[T;p, q], <'), where m <' n iff for 
the vertices v, w that correspond to m, n we have u < w. 
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Figure 5: The graph G3. The gray areas highlight the different layers of G3. 

5. Inexpressibility of Reachability in Undirected Graphs 

While LREC captures LOGSPACE on directed trees, its expressive power still lacks the ability 
to define certain important problems on undirected graphs that can be defined easily in other 
logics such as STC with LOGSPACE data complexity. As an example, we show in this section 
that LREC cannot define reachability in undirected graphs: 

Theorem 5.1. There is no LREC [{E}]-formula ip(x,y) such that for all undirected graphs 
G and all v,w € V(G), G \= (p[v, w] iff there is a path from v to w in G. 

As an immediate corollary we obtain: 

Corollary 5.2. STC £ LREC 

To prove Theorem 15. 1\ we show that reachability is not LREC-definable on a certain 
class of directed graphs. This class, called C throughout this section, is defined in terms of 
the following family of graphs G n , for n > 1. Here, each graph G n consists of 2 • n 2 vertices, 
which are partitioned into layers Vj , . . . , V^, Vi, ■ ■ ■ , V% with \V?\ = n. Any two vertices in 
consecutive layers V? and V? +l are connected by an edge. That is, the set E(G n ) of edges 

of G n is {(v,w) 6 V? x Vi +1 I i 6 [n — 1], j £ [2]}. For example, the graph G3 is shown in 
Figure O Now, the class C is defined as: 

C := {G j G is a graph such that G = G n for some n > 1}. 

The key property of the graphs in C that enables us to show that reachability on C is 
not LREC-definable is that they are rich in a certain kind of automorphisms. Indeed, let v 
and w be nodes occurring in the same layer of G n . Then there is an automorphism of G n 
swapping v and w, and fixing the remaining vertices point-wise. To see why this could be 
useful at all, consider an LREC-formula ip of the form [Irec^^.p ^e, ( Pc]{w.,f), and suppose 
we want to decide membership of a tuple (a,Q,£o) in the relation X defined by ip in (G n ,a), 
for an assignment a. First, we would compute the graph G with vertex set G^ 1 and edge 
set E defined by ip?,, and then we would recurse to decide which of the tuples (ai,£i), for 
successor nodes a\ of ao in G and l\ = [(£0 — l)/|Eai|J , belong to X. To decide membership 
of each of the tuples (ai,£i) in X, we again have to recurse to decide which of the tuples 
(02,^2), for successor nodes 0,2 of a\ in G and £2 = Y{£i — 1) / |Ea,2 1 J , belong to X, and so on. 
Exploiting the above-mentioned automorphisms enables us to show that along each branch 
(ao,£o), (ai,£i), (02,^2), ... of the "recursion tree", we see only a constant number of tuples 
(cii + i,£i + i), where a,i+x does not contain all the vertices of G n that occur in Oj, or vice versa. 
Thus, we are left with finitely many sub-branches "in between" those tuples that contain 
the same vertices of G n . If all those sub-branches had constant length, then the whole 
"recursion tree" would have constant depth, so that we could easily find an FO+C-formula 
that is equivalent to tp on C (provided <pz and <pc are equivalent to FO+C-formulae). Since 
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reachability is not FO+C definable on C, this would immediately imply Theorem 15.11 In 
general, the sub-branches do not have constant length (due to number variables that may 
occur in u\ and U2), so that we move to a logic that is more expressive than FO+C, but 
still lacks the ability to define reachability on C. 

More precisely, we show that on C, every LREC[{-E}]-formula is equivalent to a formula 
in the infinitary counting logic ££^(0), introduced in [21] (see also [221 Section 8.2]). The 
fact that £j!j 0(J (C)-formulae without free number variables are Gaifman-local [21] then yields 
that reachability is not £^ ow (C)-definable, and hence not LREC-definable, on C. 



5.1. The Logic £j5o W (C). Before delving into the details of translating LREC-formulae into 
£^(C)-formulae, we gi V e here a brief review of the logic £^ W (C). For a detailed account, 
we refer the reader to [21], or \22\ Section 8.2]. 

^oooj(C) on the one hand extends FO+C by allowing for infinite disjunctions and con- 
junctions, and on the other hand imposes restrictions so as to make the resulting logic not 
too powerful. While in the context of FO+C, we equipped structures A with a counting sort 
N(A) = [0, |y(^4)|], in the context of £^(0) we extend this counting sort to the set of all 
natural numbers. Furthermore, £^ ow (C)-formulae may use any natural number n G N as a 
constant, which is always interpreted as n. 

C^^iC) is a restriction of the extremely powerful logic £oow(C), which is defined as 
follows. A term t is a structure variable, a number variable, or a non-negative integer; if t 
is a structure variable, we call t structure term, and otherwise number term. The atomic 
formulae of £oocj(C)[t] have the form R(x\, . . . ,x r ), where R G r, r is the arity of R, and 
structure variables; or t = u, where t and u are either structure terms or 
number terms; or t < u, where t and u are number terms. The set of all £oocj(C)[t]- 
formulae is the smallest set that contains all atomic formulae, and is closed under the 
following formula formation rules: 

(1) If <p G £ooo;(C)[t], then -,<p G £ooc(C)[t]. 

(2) If $ C /^(CM then and A $ belong to £oou(C)[r]. 

(3) If (p G £ooo;(C)[t] and x is a variable, then 3xip and Vxcp belong to £oou(C)[r]. 

(4) If ip G £oou(C)[r], x is a structure variable, and n G N, then 3-xip. 

(5) If if G £oo^(C)[r], x is a tuple of structure variables, and p is a tuple of number 
terms, then #x(p = p belongs to £oo^(C)[t]. 

Note that, in contrast to FO+C, £000; (C) restricts us to tuples of structure variables in 
counting formulae #xip = p. The semantics of £ oow (C)[r]-formulae constructed as in[U[31 
and [5] is as usual. The semantics of formulae of the form V ^ or A ^ is "at least one cp G 
is satisfied" and "all ip G $ are satisfied", respectively. Formulae of the form 3- n xip have 
the meaning "there are at least n assignments to x for which (p is satisfied". 

£^ oaJ (C)[r]-formulae are those £oo W (C)[r]-formulae whose rank is bounded. Here, the 
rank rk(</j) of a £000; (C) [r]-formula (p is defined as follows. For atomic formulae ip we have 
rk(<^>) = 0. Furthermore, rk(-n^) = vk((p), rk(V 3?) = rk(A <$) = sup^g^ rk(<y9), vk(3x(p) = 
rk(Vxy>) = rk(3- n x(p) = 1 + rk(i^) if x is a structure variable, vk(3xip) = rk(Vx(/?) = rk(^) 
if x is a number variable, and rk(#xip = p) = \x\ + rk(c^). Now, a £ CXD w(C)[r]-formula <p 
belongs to ££o W (C)[t] if there is a number n G N with ik{p) < n. 

As shown in [21], every £^(0) formula without free number variables is Gaifman 
local. To make this precise, we need some more notation. Given an undirected graph G 
and vertices v,w G V(G), let dist G (u, w) denote the length of a shortest path from v to w 
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in G, or oo if there is no such path. For all k > 1, all tuples v = (v%, . . . , v^) G V(G) k and 
all reft, let B?(v) := {w G F(G) | 3i G [A;]: dist G (v;, w) < r}, and define N^(v) to be 
the subgraph of G induced by B^(v). The following theorem is stated in [2T] for arbitrary 
vocabularies: 

Theorem 5.3 (Restricted form of a theorem in [5T]). For every C^ OUJ (C)[{E}]-formula 
<p(x) without free number variables, there is an r G N such that for all graphs G and all 
a,b G V{G)W with (N^(d),d) ^ (N?(b),b) we have: G \= <p[a\ G (= y>[6]. 

Using Theorem 15.31 it is straightforward to show that: 

Corollary 5.4. There is no C'^ oul (Cf)[{E}}-formula <p(x,y) such that for all G G C and all 
v, uu G V(G) we /iaue G |= <^[u, u;] iff there is a path from v to w in G. 

Proof. For a contradiction, suppose that ip(x,y) is an £^ oa; (C)[{£ , }]-formula such that for 
all G G C and all v,w G V(£r) we have G (= </j[u, u;] iff there is a path from u to w in G. 
Let r G N be as guaranteed by Theorem 15.31 We can now pick vertices v,w\,W2 G G r +2 
with iV r r+ (v, W\ (v,W2) such that tui is reachable from v, but W2 is not reachable 

from v. Since G r+ 2 |= (p[v,Wi], we then have G r+ 2 |= (p[v,W2], a contradiction. □ 

5.2. Translation of LREC-Formulae Into £^ oa; (C)-Formulae. We now describe the 
translation of an LREC-formula ip into an >C^ oa; (C)-formula tip that is equivalent to ip on C. 
The translation proceeds by induction on the structure of <p, where the only interesting case 
is that of LREC-formulae <p of the form 

[lrec„ ljS2> p tpz, <p c ](w,r). 

To decide whether <p holds in a given graph G n under an assignment a, (p needs to check 
whether the tuple (do,£o), for do := a(w) and £q := ( a (r))> belongs to the relation X 
defined by tp in (G n , a). To this end, it looks at the graph G with vertex set G^ 1 and edge 
set ip^[G ni a; u\, U2], or rather at its to-unravelling G( a °^°) at do: 

Definition 5.5. The i-unravelling of a graph G = (V, E) at a vertex v G V is the tree G^ v '^ 
defined as follows: 

(1) The nodes of G^'^ are all finite sequences ((vq, £q), . . . , (v n , £ n )), where (vq,£q) = 
(v,£), (vq, . . . ,v n ) is a path in G, and t- L = — l)/[Eui |J for every i G [n]. 

(2) There is an edge from a node ((vq,£o), • • • , (v m ,£ m )) to a node ((v' ,£' ), . . . , (v' n ,£' n )) 
whenever n = m + 1, and (v^,^) = (vi,£i) for every i < m. 

(3) Each node ((vq, £0), ■ ■ ■ , {v m ,£m)) is labelled with (v m ,£ m ). 

For each node of G^ a °'^°), (p checks whether its label belongs to X. Clearly, this suffices to 
decide whether (do,£o) G X. 

Our construction is based on the following property of G( a °^°): 

Lemma 5.6. Let <p E (x,y,z) be a formula, where x,y are compatible, let n > \x\ + \z\ + 2, 
let a be an assignment for <p E in G n , and let G = (V, E) be the graph with V := G^ and E := 
(pz[G n ,a;x,y]. Consider a node ((do, £q), . . . , (d m , £ m )) in G^'^, where £ < \N(G n )\ r — 1. 
Then, the size of 

l:={ie [m] I (oi-j U a(z)) n V(G n ) ^ (5j U a(z)) n V(G n )} 

is bounded by a constant that depends only on ip E and r. 
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Proof. We first show that the size of 

K,:={i£l\ 5i_i n V(G n ) £ {en U a(z)) D F(G n )} 

is bounded by a constant that only depends on ip E and r. To this end, consider an i G K, 
and a b G a^_i n V(G n ) such that b g" U a(z). Let us call an element b' £ V(G n ) a sibling 
of 6 if b and 6' belong to the same layer in G n . There are at least 

n — \ai U ot(z)\ — 1 > n — (\x\ + \z\ + 1) 

siblings of b in G n that do not occur in di U a(z) U {&}. Each such sibling b' gives rise to 
an automorphism fy. V(G n ) — > V(G n ) of G n that fixes all the vertices in V(G n ) \ {b,b'} 
point-wise, maps b to b', and maps b' to b. As a consequence, for each such sibling b' we have 
fv{ai-i)oLi € E, where fb'(di-i) is the tuple obtained from aj_i by replacing each element 
b" in (ii-i that belongs to V(G n ) with fv(b"). This implies 

|Eaj| > n — di, 

where d\ := \x\ + \z\ + 1 depends only on </? E . 

Observe that, by the definition of G^), we have £ = I < \N(G n )\ r - 1 < (2n) 2r and 
> 11^=1 l E «il- Hence, 

m 

(2n) 2r > ni E «il ^ Ill Eci il ^ = (n-di) 1 ^. 

i=l i&K i&K 

For n > di + 1 this implies |/C| < log n _ dl (2n) 2r < 2r(l + log„_ di n), which is bounded by a 
constant c?2 that only depends on <p% and r. 

To conclude the proof, consider a maximal set I' CI such that there are noi,t'eI' 
and k e K. with i < k < i'. We show that is bounded by a constant cfe that depends 
only on 9? E . This then implies the lemma as 

m < (i/ci + 1) • (d 3 + 1) < (<fc + 1) • (d 3 + 1). 

Let i m i n := minZ 7 and i max := maxZ', and notice that 

(^.iUa(z))nng C (a imm Ua(z))nF(G n ) C ... C (a im „ U«(z)) nF(G„). 

Since (aj max U a(z)) PI V(G n ) contains at most d% := \x\ elements that do not belong to 
(oj min _i U a(z)) n V(G n ), there are at most d% indices i <G [imin,«max] with (a^-i U a(.z)) PI 
V(G n ) C (di U a(z)) P F(G n ). Hence, \T\ < d 3 , as desired. □ 

We are now ready to prove that on C, every LREC[{i?}]-formula is equivalent to a 
£^(C)[{£}]-formula. 

Lemma 5.7. For every LREC[{E}] -formula y>(x), there is a Cl ou} (C)[{E}]-formula <p(x) 
such that for all G G C and all a € G x , we have: G \= ip[a] G \= <p[a\. 

Proof. As mentioned above, we proceed by induction on the structure of (p. The only 
interesting case is that of an LREC[{£'}]-formula of the form 

if = [lrec SljS2i p ip E , <p c ](w,r). 

Let v E be an enumeration of all variables in free((^ E ) that are not listed in u\U2, and let vc 
be an enumeration of all variables in freeze) that are not listed in u±p. 

We aim to construct, for all integers n > 1 and t < \N(G n )f\ - 1, a £^JC) [{£}]- 
formula ip n! e(ui, vz, vc) such that for all assignments a in G n , and all a G G^ 1 , 

G n \= ip ni e[a,a(vE.),a(v c )] ^=> {d,£) £ X, 
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where X is the relation defined by tp in (G n , a). Furthermore, the rank of each ip n ( will be 
bounded by a constant that depends only on tp, so that 

tp := \J ^"the universe has size 2n 2 " A "f represents the number £" A ip n ^(w, v E , Vc)j 

n>l 
^<(2n 2 +l)l f 'l 

is a £J ow (C)[{E'}]-formula that is equivalent to tp on C. 

Construction of ip n i(ui, v^, Vq): Fix n > 1 and £ < | |f| - 1. To simplify the presen- 

tation, we also fix an assignment a in G n , and the graph G = (V, E) with V := G^ 1 and 
E := tp E [G n ,a; u\,U2\; the formula ip n ,i(u~i,VE, vc) we are going to construct will however not 
depend on a. For every a G V, let 

tn,i{( 1 ) '■= m a x {£ G N j there is a node (oo, • • • , (o, m ,£ m ) in G*-"'^ such that t equals 

\{i G [m] I (2i-i U a(5 E )) n V(G n ) + {ai U a(v E )) n 
By Lemma 15.61 there is a constant t* that only depends on p such that 

tn,i(( l ) < £* f° r a ll i£V. 
In what follows, we construct, for all t < t*, a £^(C)[{£?}]-formula tp^^ui, v%, vc) such 
that for all a £ V with t n ^(a) < t, we have: 

G n \= ^n : Aa,a(v E ),a{vc)} (a,£) £ X, 

where X is the relation defined by tp in (G n ,a). Furthermore, the rank of ip^i will not 
depend on n or £. The desired formula ip n £ can then be defined as: 

Construction of ip n g(u\, v E , vc): We construct the formulae ug, «c) by induction on 

i. For i = 0, we define ^ ^e, £>c) to be an arbitrary unsatisfiable formula. The idea for 
the construction of ip^ (ui, Ug, Vc) is as follows. Let a G V, and 

Q(a) := {(a m ,£ m ) | ((a ,4), • • • , (4,4)) e F(G (a '^), and for all % G [m] we have: 
(ai_i U n V(G„) = (5j U a(v E )) n V(G n )}. 

To check whether (a,£) G X, we "guess" the set X = Q(a) n X, and then simply check 
whether (a, £) G X. To guess X, we can use an infinite disjunction over all subsets R of 
Q(a). Then we only need to verify for each R whether R indeed corresponds to X. For 
the latter, we count, for each pair (a',£') G Q(a), the number of pairs (a",£") such that 
a'a" G E, I" = [(£' - l)/|Ed"|J and {a" ,£") G X, and check that (a',f) G R whenever this 
number belongs to the label of a' defined by tpc- How do we check whether (a",£") G X? 
If (a' U a(y E )) n V(G n ) = (a" U a(v E )) n V(G n ), that is, if (a",£") G Q(a), then we simply 
check whether (a",£") G R. Otherwise, we use the formula ^nf"- 

Let tp' E and be £^ ow (C)[{i?}]-formulae that are equivalent to tp E and tpc, respectively. 
Such formulae exist by the induction hypothesis. Using p' E it is easy to construct, for each 
£' G [0,£], an £^(C)[{£}]-formula Xe{ui,u' l ,v E ) such that for all a, a' G G^ 1 , 

G n ^ X e[a,a',a(v E )] (o',£') G Q(o). 

Here, is a tuple of variables that is compatible with, but disjoint from u\. 
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Let Q' be the set of all pairs (u',£'), where £' G [0,£], and v! is obtained from u[ by 
replacing each structure variable with a structure variable from u\ and each number variable 
with an integer from N(G n ) = [0, 2n 2 ]. Intuitively, each -RCQ' corresponds to a guess of 
Q(a) fl X as described above. For each R C Q', let ^^(uijVejVc) be 



A [Xe'(ui,u',v E ) -»■ 3p^c(u',p,u c ) A#u"(<p' e (u',u",ve) /\$R,u>,eW)) 

(u',i')eR> 



P 



(u'/)e<3' 



where 



V 

«"S[0,«'] 



Eu" 



" A {(xi»(u',u",v E ) A O € fl") \Zip t nr (u",v E ,v c )] 

u". Then it is not hard to see that the 



• u 



and u (u",£") G R" stands for V (u*,e*)eR,e 
formula 

-RCQ' 

(ui,£)eR 

is as desired. Clearly, the rank of does not depend on n or £. 

To conclude this section, note that Theorem 15.11 follows immediately from Lemma 15.71 
and Corollary 15.41 



□ 



6. An Extension of LREC 

The proof of the previous section's Theorem 15.11 indicates that LREC is not closed under 
logical reductions, not even under very simple first-order reductions^ Indeed, it is easy to 
see that there is a first-order reduction that maps a graph G n , for n > 3, as defined in 
Section to a disjoint union G n of two directed paths on n vertices each, by identifying 
vertices in the same layer. Reachability on the class of all graphs isomorphic to G n for 
an n > 3 is easily seen to be LREC-definable. Hence, if LREC was closed under first-order 
reductions, then reachability on the class of all graphs isomorphic to G n for some n would 
be LREC-definable, contradicting the previous section's results. 

In this section, we introduce an extension LREC= of LREC whose data complexity is 
still in LOGSPACE, and thus captures LOGSPACE on directed trees, while being closed 
under logical reductions. The idea is to admit a third formula ip= in the Irec-operator that 
generates an equivalence relation on the vertices of the graph defined by ip%. 

Let r be a vocabulary. The set of all LREC=[r]-formulae is obtained from LREC[r] by 
replacing the rule for the Irec-operator from Section [3] as follows: If u,v,w are compatible 

We defer the definition of logical reductions and what it means to be closed under logical reductions to 
Definition 16.41 and Lemma 16.61 For first-order reduction, see also [5]- 
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tuples of variables, p,r are non-empty tuples of number variables, and </?=, </? E and (fc are 
LREC=-formulae, then the following is an LREC=[r]-formula: 

tp := [\reoa,v,p<p=, ¥>e, (fc](w,f). (6.1) 

We let fvee(f) := (free(<^=) \ (u U v)) U (free(</? E ) \ {u U u)) U (free(</? c ) \(uUp)) UwUr. 

To define the semantics of LREC=[r]-formulae </? of the form (|6.ip . let A be a r-structure 
and a an assignment in A. Let Vo := ^4" and Eo := Pe[A, a; u, v]. We define ~ to be the 
reflexive, symmetric, transitive closure of the binary relation ip = [A,a;u,v] over Vo. Now 
consider the graph G = (V, E) with 

V := V /~ and E := {(a/~, b/J) G V 2 | ab G E }. 

To every a/ ^ € V we assign the set 

C{a/rJ) := {(n) | there is an a! G aj ^ with n G tpc[A, a[a'/u];p]} 

of labels. Then the definition of X can be taken verbatim from Section^ We let (A, a) \= </? 
if and only if (a(iD)/^, (a(f))) G X. As for LREC, we have: 

Theorem 6.1. For every vocabulary t, and every LREC=[t] -formula ip there is a deter- 
ministic logspace Turing machine that, given a r-structure A and an assignment a in A, 
decides whether (A, a) \= ip. 

Sketch. The proof is a straightforward modification of the proof of Theorem 13.41 The only 
difference is that, when we deal with LREC=-formulae of form (|6.1[) . we use the vertex set 
V, the edge set E, and the labels C(-) as defined above to compute the set X. It is easy to 
compute these sets by first computing the relation ~ from ip=[A, a;u,v] using Reingold's 
logspace algorithm for undirected reachability [23]. Note that once ~ has been obtained, 
the equivalence class of every element a G A u can be determined. □ 

The following example shows that undirected graph reachability is definable in LREC = . 
This does not involve an implementation of Reingold's algorithm in our logic, but just 
uses the observation that the computation of the equivalence relation ~ boils down to the 
computation of undirected reachability. 

Example 6.2 (Undirected reachability). The following LREC=-formula defines undirected 
graph reachability: 

ip(s,t) := [Irec^p (p=(x,y), ip E (x,y), ip c (x,p)}(s, 1), 

where ip=(x,y) := E(x,y), ipz(x,y) := ->x = x and ifc(x,p) := x = t. To see this, let G 
be an undirected graph and a an assignment in G. Define ~, V, E, C and the set X as 
above. Clearly, the set V consists of the connected components of G. Furthermore, the set 
E is empty since ipE is unsatisfiable. Therefore, for all v G V(G) we have {v 1) G X iff 
G C(v/^). The latter is true precisely if a(t) G i>/~, i.e., if v and a{t) are in the same 
connected component of G. It follows that for all v,w G V{G) we have G \= (p\v,w\ if and 
only if v and w are in the same connected component of G, that is, if there is a path from 
v to w in G. □ 

Remark 6.3. It follows immediately from the previous example that STC+C < LREC = . 
Actually, the containment is strict, because LREC j£ STC+C by Corollary 14.61 Since in 
STC+C (and actually in STC) it is possible to transform trees into directed trees, the 
results from Section Li] imply that LREC= captures LOGSPACE on the class of all trees, 
directed as well as undirected. Note also that LREC= < FP+C. 
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To conclude this section, we show that LREC= is closed under logical reductions. We 
first introduce L-transductions (also known as L-interpretations [S]): 

Definition 6.4 (Transduction). Let L be a logic, let t%,T2 be vocabularies and let £ > 1. 

(1) An i-ary L[ri, T^-transduction is a tuple 

6 = (0v(u),6&(u,v),(e R (u R> i,...,u R ^( R ))) Re ^ 

of L[ri]-formulae, where u, v are compatible ^-tuples of variables and for every R G t% 
and i G [ar(i?)], u Rj i is an £-tuple of variables that is compatible to u. 

(2) Let A be a ri-structure such that ^yLA;?/] is non-empty. We define a T2-structure 
Q[A] as follows. We let ~ be the reflexive, symmetric, transitive closure of the binary 
relation u, v], and call ~ the equivalence relation generated by #~[^4; u, v]. Let 

V(G[A]) := e v [A;u}/^ 

and for each R G T2, let 

i?(6L4]) := {(ai/„ J ...,5 ar(fl) /„) | 

ai, • • • ,a ar(jR) G 0y[^;u], A (= 0r[oi, . . . , a ar(jR) ]}. 

So, informally, a L[ti, T2]-transduction defines a mapping from structures over the first 
vocabulary, t\, into structures over the second vocabulary, T2, via L[ri]-formulae. 

Example 6.5. Consider the FO[{-E}, {-E}]-transduction = (0y(x), 9~{x, y), 6e(x, y)) 
with 6v{x) '■= x = x, 0~(x,y) := Vz(E(x,z) o E(y,z)) and 0E(x,y) ■= E(x,y). Re- 
call the definition of the graphs G n from Section For n > 3, the equivalence relation ~ 
generated by #~[G n ;x,y] is 9^[G n ; x,y] itself. It relates any two vertices that occur in the 
same layer of G n . Hence, for n > 3, 8[G n ] is the disjoint union of two paths of length n. □ 

The following lemma shows that LREC = is closed under LREC=-reductions. Precisely, 
this means that: 

Lemma 6.6. Let t\,T2 be vocabularies, let i > 1, let 

e = (e v (u),e^(u,v),(9 R (u R>1 ,...,u R>al ^)) Re7 ^ 

be an l-ary LREC=[ri, tq\ -transduction, and let (p(xi, . . . ,x K ,pi, . . . ,p\) be an LREC=[t2]- 
formula with xi,...,x K structure variables and p%, . . . ,p\ number variables. 

Then there is an LREC=[ti] -formula (p~®(ui, . . . , u K , qi, . . . , q~\), where u\, . . . ,u K are 
compatible with u and q~i,...,q~\ are i-tuples of number variables, such that for all t\- 
structures A where Q[A] is defined, all a\, . . . ,a K S A u and all hi, . . . , h\ € N(AY, 

A \= <p~ e [ai, . . . ,a K ,ht, . . . ,n A ] <J=^> ai/~, . . . , a K /~ G V(G[A]), 

(ni) A ,...,{n x ) A G N(@[A]), and 

<d[A] H ¥>[oi/«, • • ■ ,5 K /«> ' • • • ' ("A>aI ' 
where ~ is the equivalence relation as defined in Definition \6.4\ 

Proof. The proof is by induction on the structure of (p. Without loss of generality, we 
assume that (p neither contains implication (— >) nor biimplication (•<->■). 

To simplify the presentation, we consider a fixed ri-structure A where Q[A] is defined 
and let ~ be the equivalence relation as defined in Definition 16.41 We also consider fixed 
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oi, . . . , a K G A u and hi, . . . ,h\ G N(A) e . The reader should consider A and these tuples to 
be universally quantified in the statements where they occur. 

From 6~(u,v), it is easy to construct an LREC=-formula 9L(u,v) such that 9L[A;u,v] 
is the equivalence relation generated by 9~[A;u,v], that is, the reflexive, symmetric, and 
transitive closure of 9~(u,v). Let Xs {u) ■= 3v (9y(v) A 9L(u,v)). Then for all a G A u , 

A h X s[a] o/« G F(0[A]). 

Using the construction from the proof of [20\ Lemma 2.4.3], we can construct an LREC=- 
formula Sy(q) such that for all h G N(AY we have A \= Sy[h] whenever {h) A = |V(OL4])|. 
Hence, for X n{q) ■= 3q'(S*(q') A "q < q' n ) and for all h G N(A) e , 

A \= Xn[n] <i=> {n) A G N(Q[A]). 

Finally, let 

X A Xs(«i) A f\ Xn(q~i) 
ie[«] ie[A] 

Then, 

A[=x[oi,...,o K ,fii,...,n A ] 

Oi/«,... ) o K /«€V(e[A]) and (n x ) A , . . . , {n x ) A G 2V(0L4]). 

Given </j(xi, . . . ,x K ,pi, . . . ,p\) we now construct ip (ui, . . . ,u K ,q~i, . . . , q\) inductively 
as follows: 

(1) Suppose that ip = R(xi 17 . . . , x% k ), where ii,...,ik G [«]. Let X := {ii, . . . 
Then, 

p- := xA(3^) ieZ ( /\e'^(ui,Vi) A /\ M«i) A R (v h , . . . ,v ik ) j . 

(2) If </> = a?j = Xj, where i, j G [«], then <^ _e := X A 9L(ui,Uj). 

(3) If (p = pi *Pj, where * G {=, <} and i, j G [A], then, ip~® : = X A * <£j". 

(4) If 99 = then ip~ e := X A -n/'~ . 

(5) If tp = ipi* ip2, where * G {A, V}, then (p~ e := tp^ e * 

(6) Suppose that <p = Quip with Q G {V, 3} and u G {xi, . . . ,x K ,pi, . . . ,p\}. In case 
that Q = V and it = X{, we let c/j _e := X A \/ui( Xs {ui) —> The other cases can 
be dealt with similarly. 

(7) Suppose that ip = #(x h , . . . , x ik ,p ik+1 , . . . ,Pi k+m ) V> = (Pji, ■ ■ ■ ,Pj k ,)- Based on the 
construction from the proof of [201 Lemma 2.4.3], it is possible to construct an 
LREC=-formula S(fi, . . . , r&') such that for all mi, . . . , m&' G ./V(j4)^, 

A \= 5[mi, . . .,rh k r] 

<=> |{(aw/«)--->«i fe /«><«i fc+ i) J 4'---'( fi ^+m)J I 

A (= ^ -e [ai, . . . ,a K ,hi, . . .,h\}\ = (mi, . . .,m h i) A , 

where 

fe' 
s=l 

We then let (/J -0 := xAdiq^,..., q jy ). 
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(8) Suppose that if = [\recw ^' ,p' tp = , tp^, (pc](w' , r') . Then, 

tp- e := X A 3f(/3(f",f) A [\recu",v",f» <fZ & , V? E 6 \ ^Pc @ ](w",r)), 

where u", v" , w",p", f" are obtained from u', v', w',p', f by replacing, for each i € [k], 
the variable X{ by Ui, and for each i G [A], the variable pi by qf, f is a tuple of number 
variables of length £ ■ \ f'\; and /3 is defined as follows. For simplicity, assume that 
f' = (pi, . . . Hence, f" = (q\, . . . , g&). The formula /3(f", f) has the property 

that for all m G N(A)", 

fe 

Ah^[ni,...,n fe ,m] ^ (m) A = £ (n s ) A ■ |iV(e[A])| s - x . 

s=l 

Note that, since |iV(©[^4]) | < |iV(A)| , the tuple m is long enough to hold the sum 
on the right hand side. Constructing (3 as desired is a not too difficult exercise. 
It is straightforward, though tedious, to verify that (p~® is as desired. □ 

7. Capturing Logspace on Interval Graphs 

With the added expressive power of LREC=, it is not only possible to capture LOGSPACE 
on the class of all trees, but also on the class of all interval graphs, as we shall show in 
this section. Basically, interval graphs are graphs whose vertices are closed intervals, and 
whose edges join any two distinct intervals with a non-empty intersection. They form a 
well-established and widely investigated class of graphs, and it was recently shown [TS] (see 
also [20J) that interval graph canonisation is in LOGSPACE. 

To prove that LREC= captures LOGSPACE on interval graphs, we proceed as in the case 
of directed trees. First, we describe an LREC=-definable canonisation procedure for interval 
graphs, and then we use the fact that DTC (and hence LREC=) captures LOGSPACE on 
ordered structures. Our canonisation procedure combines algorithmic techniques from [T5] 
with the logical definability framework in [TU]. Parts of this section can be found in more 
detail in [2TI] . 

7.1. Background on Interval Graphs. In this section, we define interval graphs and 
state some basic properties. For a more detailed exposition, we refer the reader to |2Uj . 

Definition 7.1 (Interval graph, interval representation). Given a finite collection X of 
closed intervals Ij = [ai, bj\ C N, let G% = (V,E) be the graph with vertex set V = X, 
joining two distinct intervals Ii,Ij € V by an edge whenever 7j n Ij ^ 0. We call X an 
interval representation of a graph G if G = G%. A graph G is an interval graph if there is 
an interval representation of G. 

Figure shows an interval graph G together with an interval representation of G. 

An interval representation X of a graph G is called minimal if the set \J X C N is of min- 
imum size among all interval representations of G. Clearly, for any interval representation 
X there exists a minimal interval representation X m ; n such that Gx — Gr min . 

Recall that a clique of a graph G = (V, E) is a set C C V such that the subgraph of G 
induced by C is complete. A maximal clique, or max clique, of G is a clique of G that is 
not properly contained in another clique of G. We denote the set of all max cliques of G by 
Mq- Let I be & minimal interval representation of G and denote the interval in X that 
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Figure 6: An interval graph G and an interval representation of G. 



corresponds to vertex v G V. Then M(k) = {v \ k G I v } is a max clique of G for every k 
for which M(k) is non-empty. Furthermore, for any max clique M of G, there is a k G N 
with M = M{k). Thus, any minimal interval representation of G induces a linear order on 
A4g which has the property that each vertex is contained in consecutive max cliques. It 
is known [SJ [21] that a graph G is an interval graph if and only if its max cliques can be 
brought into a linear order, so that each vertex of G is contained in consecutive max cliques. 

Thus, max cliques play an important role for the structure of interval graphs. Our 
canonisation procedure essentially relies on bringing the max cliques of an interval graph 
into a suitable order. 

The maximal cliques of an interval graph G = (V,E) can be handled rather easily in 
our logic. Let N c (t>) denote the closed neighbourhood of a vertex v in G, that is, the set 
containing v and all vertices adjacent to v. As shown in [19J, the max cliques of G can be 
identified by the vertex pairs (u,v) G V 2 with the property that N c (-u) n N c (t>) is a clique 
in G, and for no other pair (v! , v') G V 2 where N C (V) n N c (t/) is a clique in G it holds that 
N c (u) n N c (v) C N c (u') n N c (t/): 

Lemma 7.2 ([19]. Lemma IV. 1). Let G be an interval graph and let M be a max clique of 
G. Then there are vertices u,v G M, not necessarily distinct, such that M = N c (u) nN c (u). 

In particular, the max cliques of G as well as the equivalence relation on vertex pairs defining 
the same max clique are first-order definable. 

7.2. Modular Decompositions. Our canonisation procedure relies on a specific decom- 
position of graphs, known as modular decomposition, which was first introduced by Gallai 
[8]. The basic building blocks of modular decompositions are modules. Given a graph 
G = (V, E), a set W C V is a module of G if for all vertices v G V \ W either {v} x W C E 
or ({v} x W) n E = 0. Note that V and all singleton vertex sets are modules of G, called 
trivial modules. We call a module W proper if W C V. 

Gallai's modular decomposition is based on the following: If G is not connected, then its 
connected components W±, . . . , Wk are clearly proper modules. Similarly, if the complement 
graph G c of G is not connected, then the connected components W\, ■ ■ ■ , Wk of G c are proper 
modules of G. For graphs G with more than one vertex where both G and G c are connected, 
Gallai shows in [5] that the set of maximal proper modules of G is a partition of G's vertex 
set. We base our modular decomposition on the same properties, only for the last one we 
use a slightly different partition into modules W\, . . . , Wk, which we define in Section [73H 
Let Wg be the set of modules W\, . . . ,Wk and let ~g be the equivalence relation on V 



The main difference between our decomposition and Gallai's is that we do not bother to create extra 
modules for sets of pairwise connected twins since we can handle them perfectly well with our methods. 
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corresponding to the partition Wg (i.e., v ~g w whenever v,w £ Wi for some i G [k]). Let 
us consider the graph 

Lg ■= (V/~ g ,E Lg ), where E Lc := {(u/~ G , u/~ G ) \ (u,v) G E}. 
Intuitively, Lq is the graph obtained from G by collapsing all the modules in Wg into 
single vertices. Since each pair of modules Wi,Wj G Wg, i 3-> ls either completely 
connected or completely disconnected, G is completely determined by Lg and the graphs 
G[Wj], for i G [k], where G[Wi] denotes the subgraph of G induced by the vertices in W{. 
By decomposing the G[Wi], i G [k], inductively until we arrive at singleton sets everywhere, 
we obtain G's modular decomposition. 

We define the modular decomposition tree T{G) of a graph G recursively. If |V| = 1, 
then T(G) is the rooted tree that consists of only one vertex, vertex V, which is the root 
of T(G). Let \V\ > 1. Then, the modular decomposition tree T{G) is a rooted tree which 
consists of a vertex V, which is the root of T(G), and of subtrees T(G[W]) for all W G Wg- 
We obtain T(G) by adding an edge from V to the root of T(G[W}) for all W G W G - This 
modular decomposition tree is uniquely determined for every graph G [8]. 

Notice that for an interval graph G where G c is not connected, all except one connected 
component of G c must contain only a single vertex. Each of these single vertices is adjacent 
to all other vertices in G. We call a vertex with that property an apex. Thus, if G is an 
interval graph with G c disconnected, then Wg = UaeA{{ a }} U {V \ A} where A is the set 
of apices, and the graph Lg is isomorphic to a clique. Also, if G contains an apex, then 
either \V\ = 1 or G c is not connected. 

The following three sections are about defining and canonising the graph Lg for an 
interval graph G. This is easy for unconnected graphs G or graphs that have at least one 
apex. Thus, we will consider connected graphs without any apices. 



7.3. Extracting Information About the Order of Maximal Cliques. Throughout 
this section let G be a connected interval graph without any apices. 

We call a max clique G a possible end of G if there is a minimal interval representation 
I of G so that G is minimal with respect to the order induced by I. 

Now we pick a max clique M of G. We assume it to be a possible end of G, and give 
a recursive procedure that turns out to recover all the information about the order of the 
max cliques induced by choosing M as an end of G. 

Let M G A4g- The binary relation -<m is defined recursively on the elements of Mq 
as follows: 

Initialisation: M -< M G for all G G M G \ {M} 

C D if l 3E e Mg With E <M D and ( EnC )\ D ^® or / . \ 

<M 1 \^E G M.g with G -<m E and (E C\ D)\C ^ $. [ ' 

By exploiting the definition's symmetry, -<m can be defined through a reachability 
query in the undirected graph Om, which has pairs of max cliques from A4g as its vertices, 
and in which two vertices (A, B) and (G, D) are connected by an edge whenever A -<m B 
implies G -<m D with one application of (fj^]). Hence: 



Lemma 7.3. There exists an STC-formula that for any interval graph G and for any max 
clique M of G defines the relation -<m- 
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We now state a few important properties of -<m- Recall that a binary relation R on a 
set A is asymmetric if ab G R implies ba G" R for all a, b G A. In particular, asymmetric 
relations are irreflexive. 

Lemma 7.4 ([E], Lemma IV. 3, Corollary IV. 6, Lemma IV. 7). Let M be a max clique of 
an interval graph G. Then the following properties are equivalent: 

• -<M is asymmetric, 

• ->m is a strict weak order (that is, -<m is irreflexive, transitive, and incomparability 
is an equivalence relation), 

• M is a possible end of G. 

Since -<m is STC-definable and asymmetry of -<m is FO-definable, the preceding lemma 
gives us a way to define possible ends of interval graphs in STC+C. 

Lemma 7.5. Let C C Mq ^ e a set of max cliques with M C. Suppose that for all 
B G A4g \ C and any C,C' G C it holds that B n C = B DC" . Then the max cliques in C 
are mutually incomparable with respect to -<m- 

Proof. By a derivation chain of length k we mean a finite sequence Xq -<m Yo, X\ -<m Yi, 
. . ., Xk <m Yk such that Xq = M and for each i G [k], the relation Xj -<m Yi follows from 



Xi— l ~<m i by one application of (W)- Clearly, whenever it holds that X -<m Y there is 
a derivation chain that has X 7 as its last element. 

Suppose for contradiction that there are C,C" £ C with C <m C . Let M -<m ^o, 
X\ <m Y\, . . ., Xk <m Yk be a derivation chain for C -<m C ■ Since Xk = C, Yk = C' , and 
M $l C, there is a largest index i so that either Xi or is not contained in C. 

If Xi C, then € C and Fj = Y i+1 G C and it holds that X, n X i+ i \ Y i+1 ^ 0. 

Consequently, Xj nXj+i ^ Xj nli+i, contradicting the assumption of the lemma. Similarly, 
if Yi C, then Y i+1 G C and X { = X i+1 G C and it holds that Y { n V i+ i \ X i+1 ^ 0. Thus, 
Vj fl Y- L+ i ^ Vj Pi Xj + i, again a contradiction. □ 

The span of a vertex u G V in G, denoted span(u), is the number of max cliques of G 
that v is contained in. Recall from Section [7TT] that the equivalence relation on vertex pairs 
defining the same max clique is first-order definable. Note that, since equivalence classes 
can be counted in STC+C |19[ Lemma II. 7], the span of a vertex is STC+C-definable on 
the class of all interval graphs. 

Lemma 7.6 ([E], Lemma IV.4, Corollary IV. 5). Suppose M is a possible end of G and C 
is a maximal set of -<m -incomparable max cliques. Then 

• BDC = BnC for allC,C' eC, B £M G \ C, 

• Sc := UceC C \ UBeM G \c ^ ls a m °dule of G, and 

• S c = {v G UC | span(v) < \C\}. □ 

Finally, let be the equivalence relation on V for which x y if and only if x = y, or 
there is a maximal set C of incomparable max cliques with respect to -< m with \C\ > 1 so that 
x,y G Sc- Let Gm = G/^g := (V/^g ,Em), where Em ■= {(«/~° > ) I ( u '' t; ) G -^1- ^ 
is easy to check that and the graph Gj^ are STC+C-definable. 

If C is a maximal set of -< A/-incomparables in G with |C| > 1, then there is precisely 
one max clique Mq in Gm which contains all the equivalence classes associated with C, i.e., 
Me = {via \ve\JC}. We conclude: 

M 

Lemma 7.7. -<m induces a linear order on Gm 's max cliques. In particular, Gm is an 
interval graph. □ 
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7.4. Modules Wg and the Graph Lq. We are now ready to give the definition of the 
set Wq, which we mentioned in Section 17.21 for connected interval graphs G without an 
apex. Furthermore, we show how to define graphs that are isomorphic to the graph Lq 
from Section T7.2I in STC+C. In particular, this will enable us to prove, in Section [7.51 that 
an isomorphic copy of Lq on the number sort is STC+C-definable. 

Let G = (V, E) be a connected interval graph without an apex. Then G contains more 
than one max clique. Let ^q be the set of all maximal proper subsets C of A4q with the 
property that for any B G Mq \ C we have B n C = B n C for all C, C G C. We must 
have |^Pg| > 3 since G is connected and no vertex may be included in all max cliques of G. 
Furthermore, if C,C G tyc and C ^ C , then CnC = 0. To see this, suppose that D G CflC. 
Then B n A = B n D = B n C for all A, C G C U C and B $ C U C. So as [*p G [ > 3, C U C 
is a proper subset of .Mg satisfying the above property, which contradicts the maximality 
of C and C We conclude that *Pg is a partition of .Mg- 

For each C G *Pg with [C| > 2 we define = U C \ (J(-^G The correspondence in 
names to the modules Sq as defined in Lemma 17.61 is intended, of course, and makes sense 
since the sets C £ tyc enjoy the same interaction properties with the rest of the graph as 
maximal sets of ^^-incomparable max cliques (cf. Lemma 17.6]) . 

We can now define the modules Wg mentioned in Section 17.21 for connected interval 
graphs G without an apex. We let S := {Sc \ C € ^Pg with \C\ > 2}, and define 

W G :=SU |J {{v}}. 
vev\\Js 

From the fact that ^3g is a partition of Mq, we conclude that Wg forms a partition of 
V, whereby inducing the equivalence relation ~g on V. In the following, we call this 
equivalence relation alternatively ~*p G - 

We are going to construct STC+C-definable graphs isomorphic to Lq. Let Zm be the 
max clique which is ^/-maximal in Gm- Now we forget about -<m and consider -<z M on 
Gm- We write 

Lm ■= Gul js M = (V(Gm)/ ^g m , E(Gm)I j^m) 

Z M Z M Z M 

with E(Gm)/ g m = {(u/ g m ,v/ a M ) | (u, v) G E(Gm)}- Lemma \T77\ implies again that 

Z M Z M Z M 

■^z AI induces a linear order on the max cliques of Lm- 

Lemma 7.8. Let G be a connected interval graph that does not contain an apex, and let 
Mi, . . . , Mfc be its possible ends. Then all of the graphs Lm v I 6 [k], are isomorphic to Lq 
and we may partition [k] into at most two sets Q,Q' so that {Lm v ~^Z m ) an( ^ {.-^Mp ~^z M .) 
are order isomorphic whenever i,j G Q or i,j G Q' . 

Proof. Equivalence relation ~<p G does the same as ~ M , only that it is based on *Pg instead 
of the (finer) partition of max cliques induced by a strict weak ordering -<m- 

Our goal is to show that each Lm with M G {M\, . . . , Mfc} is isomorphic to G/^ Vg . 

For this it is enough to show that the concatenation of equivalence relation ^ M with ~ Z M 
is equal to ~cp G . Whenever C G ^q and M C, Lemma [7. 51 implies that the max cliques in 
C are ^M-incomparable. As the sets in *Pg were chosen to be maximal, C is also a maximal 
set of -<M-incomparables (Lemma 17. 6|) . It follows that ~tp G is equal to ^ M on Um£Cs«Pg ^" 
When forming Gm = GIg , each maximal set of ~< ^/-incomparable max cliques C is 

M 

replaced by the max clique Mc = {v/^g \ v G U^}- Note that this is also true when C 
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consists of just one max clique. As a result, ^q induces a partition ^m of the max cliques 
of Gm- Also, if Cm is the cell of ^m which contains M, then Cm is the only cell of ^Pm 
which is possibly not a singleton. As \^m\ — 3, Zm Cm- 

The final step is to show that ~<p M equals on ^ v ^m * s a ver * ex °f 

and v/^g is an equivalence class of ^ M with |f/^G | > 1, then v/^g is only contained 
in one max clique of Gm- Hence, ^m inherits from tyc the property that it partitions 
the max cliques M.q m of Gm into maximal sets C so that for any B G M.g m \ C we have 
B (~l C = B n C' for all C, C" G C. Arguing analogously as above, it follows that ~<p M equals 

G G 
^Zm' Therefore, ~<p G is equal to the concatenation of ~ M with ~Zm and Lm is isomorphic 

to Lq. This proves the first part of the lemma. 

To see the second part, observe that -<z M induces a linear order on Lm's max cliques. 
This is true for all M G {Mi, . . . so whenever N is a possible end of Lm-, then -<n 

linearly orders the max cliques of Lm- Thus, Lm has two possible ends which correspond- 
ingly induce two orders on the max cliques and vertices of G/^ v . □ 



7.5. Canonising Lq. Before showing how to use the modular decomposition tree for canon- 
ising interval graphs G = (V, E) in our logic, let us take a look at how to define a canonical 
copy of Lq in STC+C. 

From the fact that G is an interval graph, it is not hard to see that Lq is an interval 
graph, too. Furthermore, notice that, if A is a max clique of G, then 

A Lg := {v/^ G | v G A} 

is a max clique of Lq, and that all max cliques of Lq are of this form. 

Lemma 7.9. 

(1) There are ST 'C+C- formulae <p^, (fi such that for all interval graphs G, (p^ defines 
the equivalence relation ^q, and ipL the edge relation of the graph Lq. 

(2) Let G be a connected graph without any apices. If Lq has m > 1 max cliques, then 
there exist exactly two linear orderings of Lq 's max cliques, each the reverse of the 
other. There is an ST "C+C- formula that defines all pairs of tuples (u,v), (u' ,v') G V 2 
such that (u,v), (u',v') represent max cliques A,M of G, M is a possible end of G, 
and Al g appears within the first |_?rj max cliques of Lq with respect to <z M - 

(3) There is an ST 'C+C- formula that, for all interval graphs G, defines an isomorphic 
copy of Lq on the number sort. 

Proof. Let us start by showing property [JJ If G is not connected or G contains an apex, 
then ~c is STC-definable. If G is connected and does not contain an apex, then for each 
possible end M of G the concatenation of equivalence relation ~ M with equal to ~g 

(Lemma 17. 8p . The STC+C-definability of equivalence relation ~g is a direct consequence of 
the STC+C-definability of the possible ends M and the equivalence relation Lemma[7]71 
which allows us to define max clique Zm-, and the STC+C-definability of ~ z . 

We do not define the graph Lq explicitly, but rather implicitly within G. That is, we do 
not single out a representative of each equivalence class t>/~ G of ~g, but treat all vertices 
in u/~ G as representatives of v/^ G . Notice that, since all equivalence classes of ~g are 
modules of G, the edge relation of Lq can be defined as the set of all edges of G between 
vertices in different equivalence classes. 
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To show Property [21 recall that by Lemma 17.81 there are exactly two linear orderings 
of Lg's max cliques, each the reverse of the other. By Property [H we can define L G , and 
for a possible end M of Lq we can define the linear order -<z M (Lemma I7.3|) . Hence, given 
max clique A, we can define Al g and associate the linear order with A where -Al g appears 
within the first |_?rj niax cliques of Lq. 

Property is easy to see for graphs that are not connected or contain an apex. For 
connected interval graphs G that do not have any apices, Property follows directly from 
Section IV. B in [TO], where the author shows that there is an STC+C-formula that defines 
an ordered copy of G on the number sort if there is a max clique M of G such that -<m is 
a linear order on G's max cliques. □ 

According to the preceding lemma we can define an isomorphic copy of Lq on the 
number sort. In the following, we denote this copy by )C(Lq). 

7.6. The Coloured Modular Decomposition Tree. To obtain a complete invariant of 
an interval graph G = (V, E), we construct a refinement of the modular decomposition tree, 
the coloured modular decomposition tree, in this section. 

Let us consider the modular decomposition tree T{G) of an interval graph G. We call 
a module W G V(T(G)) a decomposition module if W = V, or \W\ > 1 and G[W*] is a 
connected graph, where W* is the parent of W in T{G). All modules W where is not 

connected are called component modules. We let W G ec be the set of all decomposition mod- 
ules and W™ be the set of all component modules occurring in the modular decomposition 
tree of G. 

Let P' := {(M, n) \ M G Mq, n € [1^1]}- Recall the definition of the span of a vertex 
from Section [731 and that it is STC+C-definable. For each (M,n) G P', define Vm,u as the 
set of vertices of the connected component of G[{v G V | span(w) < n}] which intersects 
with M (if non-empty), and let Gm,u '■= G[VjM,n]- Now let P be the set of those (M, n) G P' 
for which the following properties are satisfied: 

(1) The number n is maximal among those n' with the property that Vmu 1 = Vm,u- 

(2) For all m! > n where Vm,w! is a module, Vm,u is a subset of an equivalence class of 
~G A/ / with more than one vertex, or there exists a vertex a G Vm,™ 1 \ Vm,u that is 
an apex of Gu^m 1 ■ 

Lemma 7.10. If (M,n) G P, then V M . 

n is a connected component of a decomposition 
module in W G ec . Moreover, if D is a connected component of a decomposition module in 
the modular decomposition tree of G, then there is an (M, n) G P with Vm,u = D- 

Proof. Notice that for all modules W of G and all max cliques C of G with C n W ^ the 
set W n C is a max clique of G[W], and every max clique of G[W] is of that form. Further, 
an easy induction shows that for all modules W G W G ec U W™ n the following properties are 
satisfied: 

(A) Let C, C' G M G be max cliques of G with C ± C' where CC\W ^ and C'n W ^ 0. 
Then for max cliques C n W, C' n W of G[W] we have CnW^C'nW. 

(B) Let C := {C G M G \ C n W ^ 0}. Then for all B G M.q \ C and all C, C' G C we 
have BnC = BnC. 

(C) For the set C from (jB]), W = UceC Vc,c where c := \C\ if W contains an apex and 
c := \C\ — 1 if W has no apices, and for each C G C the set Vc )C is a connected 
component of W. 
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In order to show Lemma I7.1UI we also need the following properties: 

Claim 1. If Vm k is a connected component of a decomposition module W of G, and 
VM,k £ Vm,i for an I > k, then W C Vm,i- 

Proof. Let k' be the maximum span of a vertex in W. Since Vu,k £ W G ec U W™, we 
have Vm,& = VM,k' as a direct consequence of Property [Cj Thus, we can assume that k > k'. 
Further, we have Vm,i % W as Vm,i Q W leads to a contradiction, because Vm,i is connected 
and VM,k £ Vm,i is a connected component of W. Thus, Vm,i \ W is non-empty. Since Vm,i 
is connected and Vm,/c Q Vm,i D W, there must exist a vertex i; G V^f.j \ that is adjacent 
to a vertex in the non-empty set Vm,i H W. As is a module, i; is adjacent to all vertices 
in W. Therefore, C Vmi, because span(u>) <k'<l for all vertices to € W. □ 

Claim 2. Let (M, d) G P' and V^d be a module in >V^ ec U Wg™. If V^ id is a clique, then 
there exists only one max clique C G Mg with C n Vm jC z 7^ 0. 

Proof. Since Vjf^ is a clique, there must exist a max clique P G M.q with V^,d £ P. Let us 
assume, there exists a max clique B' G A4g different from P with P'fl Vjy d 7^ 0. According 
to Property lAl we have P n Vm,^ ^ B' Vm,cL and therefore P' n Vm,^ $! Vjf iC j. Since Vm,<2 
is a module, P' U Vm,^ is a clique, a contradiction to B' being a max clique. □ 

Claim 3. Let (M, d) G P' and V M ,d be a module in W G ec UWg n . Further, let < d' < d be 
such that Vm,cL' $j Vm,cI, and let A 7^ be the set of apices of Gm,cI- Then Vj/ # C Vj^d \ A. 

Proof. If V/vf d is a clique, then according to Claim max clique M is the only max clique in 
M.g with M n Va/,^ 7^ 0- Thus, Vu,d = and there does not exist a d! with < cf < d 
such that Vftf it |' C V M ,d- 

Now let Vm,^ be not a clique. Further, let C be the set of max cliques C G M.q with 
C H Vm,<2 7^ and c := \C\. In the following we show that a G Vm,<2 is an apex of G^y if and 
only if span(a) = a If a G Vm,^ and span(a) = c, then a is contained in every max clique 
of G that has a non-empty intersection with Vm d- As every vertex in Vm^ is contained in 
at least one max clique of G, which of course has a non-empty intersection with Vm 4, a is 
an apex of Gm4- Now let a be an apex of Gm,cZ and let us assume that there exists a max 
clique C G M.q with C D Vji^d 7^ and a g" C. Apex a is adjacent to all vertices in C D Vm,<2> 
and since Vm,<2 is a module, a is also adjacent to all vertices in C\ Vu,d- Therefore. C U {a} 
is a clique, which is a contradiction to C being a maximal clique of G. 

From span(u) = c for all vertices v G A, span(w) < c for all v G Vm,<AA and V/v/.d' Q Viw,<i 
it follows that d! < c. Consequently, Vm4' Q Vm,6 \ A. □ 

To proceed with the proof of Lemma 17.101 we first show that if D is a connected 
component of a decomposition module W G Wq c and M G Mg with M D -D 7^ 0, then 
there is an n G N such that (M, n) G P and Vm,™ = P- 

We proof this by induction on the depth of the modular decomposition tree: Clearly, 
if D is a connected component of decomposition module V (i.e., a connected component of 
G), then D = V M ,\v\ f° r a max clique M with A/ n D ^ 0, and (M, |V|) G P. 

Now, let D be a component of module W G Wg c with W 7^ V. Let c be the number 
c' of max cliques of G intersecting with W if W contains an apex and d — 1 if W has no 
apices. According to Property [Cj Vm,c = D. Let n be maximal with Vm,u = Vm,c- Then 
(M, n) G P' and D = V/v/,n- Choosing (M, n) like that ensures that Property [T] is satisfied 
for (M,n). 
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It remains to show Property [2J Let m! > n and let Vyi m i be a module. According to 
Property Q] we have Vm,u Q Vm,™'- Thus, Claim Q] implies W C V~M,rn'- 

First, let us assume there exists an apex a of Gyim'- If there exists an apex of Gyim' in 
VM, m ' \ W) we have shown Property [21 Thus, let us assume all apices of Gm,™', in particular 
a, are in W. Since W is a module and a £ If, the vertex sets Vm,™' \ and If must be 
completely connected. If W contains two vertices w, w' that are not adjacent, then in the 
minimal interval representation the interval of each vertex in Vivf.m' \ W has to intersect the 
intervals of both w and w'. Thus, the intervals of all vertices in Vyim 1 \ W intersect with 
each other and each vertex in Vm,™,' \ W is an apex, a contradiction. Let us assume If is a 
clique. Let If* be the parent module of W in the modular decomposition tree of G. Since 
If is a decomposition module, \W | > 1 and W* contains either an apex, or is connected 
and contains no apices. If* cannot contain an apex, because then all vertices in If* form 
a clique and W is not in Wg[w*]- H W* is connected and contains no apices, then W = Sc 
for C £ *PG[Ty*] where C is a set of max cliques of G[If *] with |C| > 2 (see Section [73]) . As 
W is connected, If = Vm,u- According to Claim [2] there exists only one max clique C of 
G with C n W / 0. Consequently, C" := Cn If* is the only max clique in G[W*] with 
C fl W 7^ 0, a contradiction. Hence, W cannot be a clique. 

Now let us assume that there does not exist an apex of Gm,™,'- Thus, ~g , is 
constructed as described in Section 17.41 Let W' be the parent module W* of W in the 
modular decomposition tree of G if If* is a decomposition module, or if If* is a component 
module, let If' be the parent of module If*. Then W' is a decomposition module. Further, 
let D' be the component of W' that contains W. Notice that no matter what set we chose 
for W', we have D' = If*. According to Property O there exists an n' £ [|V|] such that 
D' = Vm,u'- Let n' be maximal with that property. Therefore, If* = Vm,u' and W* is a 
component of a decomposition module. By inductive assumption we have (M, n') £ P. If 
Vm,ti' = VM,m'i then Vm> is a subset of equivalence class W of ~G JUm / with more than 
one vertex and we are done. Therefore, let us assume Vu,ri / Vm,™'- If n ' < m ' then 
Vjv/,n C If C If* = I4/,n' £ VM,m'- As (M,n r ) satisfies Property [5] and there does not exist 
an apex in G^m' , the set Vm,™' > and therefore also the set Vm,ti £ fji/.n' ; is a subset of an 
equivalence class of ~g m / with more than one vertex. 

It remains to consider m' < n' where fjvf.n' 7^ VM,m'- Then Vm,™ ^ If C Vm,™, 1 £ 
VM, n ' = W* If W* = fjltf.n' contains an apex, then If = V/v/.n' \ ^4 where A is the set of 
apices of Gm,u'- According to Claim [31 Vm,™,' != fM",n' \ A. But this implies Vm,™' £ If, a 
contradiction. 

Finally, let us assume If* = Vm,u' is connected and does not contain an apex. Then 
If = Sq for a C £ ^Pg[vk*1 with |C| > 2 where ^Pg[VK*] is the set of all maximal proper subsets 
C of M.g\W*\ > the set of max cliques of G[W *] , with the property that for any B £ M.q\w*] \ 
C we have CnB = C'nB for all C, C £ C. For all C € A^ g[V f*] with Cn%, m ' / let /(C) 
be the set C fl Vm,w,' '■ As Vu^m' is a module, the set {/(C) | C £ A^gw*], C fl Vm,™.' 7^ 0} 
is the set M.g u , of max cliques of Gm,™,'- Let /(C) be the set {/(C) | C £ C}. Then /(C) 
is exactly the set of max cliques of Gm m' that have a non-empty intersection with If. Let 
/(C), /(C) £ /(C) and /(B) £ \ /(C)- Then /(C) n /(B) = /(C) 0/(5), because 

C n B = C n 5 and therefore (C n Vji^.m') I4f, m ') = (C n Vm,™') n(Bn I4f,m')- 

Further, |/(C)| > 1, since |C| > 1 and for C,C £ C C A^ G [vy*] with C / C we have 
Cnlf ^ C'HW according to Property[A] Consequently, {C^V M , m ')r\W ^ (Cnf A/jm /)nIf 
and /(C) ^ /(C) for max cliques /(C), /(C) £ /(C). We obtain that there exists a subset 
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/(C) of Ma Mm , with f(C) C /(C) such that /(C) 6 ?P(7 Mm# . As there exists no max 
clique /(B) G A^G Mm , \ /(C) with /(P) nlf/fl,WC Sy^/) and we have shown that 
Vm,u is a subset of equivalence class S/(e') of ~G Mm , with more than one vertex. 

For the other direction, let (M, n) G P, we need to show that Vm,u is a component of a 
decomposition module. We prove this by induction on n. Clearly, this holds for n = \V{G)\, 
so let n < \V(G)\. Let p be minimal such that p > n and (M,p) G P. Since (M, |V|) G P 
such a number exists. By inductive assumption we know that Vm, p is a component of a 
decomposition module. Thus, Vm, p is a module occurring in V(T(G)), the vertices of the 
modular decomposition tree of G. 

Since (M,n) G P, (M, n) satisfies Property El Thus, Vjif )Tl is a subset of an equivalence 
class of ~G J/p with more than one vertex or there exists an apex of Gm,p in Vm, p \ Vm,u- 

Let Vm,u be a subset of an equivalence class W of ~g M(j with more than one vertex. 
As Vm,p is connected, the equivalence class W is a decomposition module. Let D be the 
connected component of W that contains Vm,u- If ^Af,n = D, then Vm )TI is a component 
of a decomposition module and we are done. If Vm,u is a proper subset of D, we obtain 
a contradiction to the choice of p, since we have already shown that for component D 
of decomposition module W there must exist an m G [|V|] such that (M,m) G P and 
^M,m = D, and n < m < p. 

Now let there be a vertex a G Vm, p \ Vm,u that is an apex of Gm, p - Let A be the set 
of apices of Gm, p - According to Claim [3] we have Vyt,n £ ^M,p \ -4- Further, |Vm, p \ A\ = 1 
implies that v G Vm, p \ A is also an apex. Consequently, \Vm, p \A\ > 1. Therefore, we have 
either shown that Vm,u is a component of equivalence class Vm, p \ A of ~g m p with more 
than one vertex or obtain a contradiction to the choice of p. □ 

Corollary 7.11. There is an ST "C+C- formula <p(x,y,z) such that for all interval graphs 
G = (V,E), allv,w G V, and all n G [\V\], we have G \= <p[v,w,n] iff M = N c (u)nN c H 
is a max clique of G and (M, n) G P. 

We are now ready to define the coloured modular decomposition tree. An illustration 
of the tree can be found in Figure 

Formally, the coloured modular decomposition tree is defined as T = Tg = {Yti^t)i 
where the set Vqr of nodes and the set Ej- of edges of T is defined as follows. Vj is the 
union of the following sets: 

• the set V of component vertices vy M n , one for each set Vm,u with (M, n) G P, 

• the set A of arrangement vertices o-{-< Q },v Mn where {^q} is the singleton set of 
the distinguished minimal order on Lq Mti 's max cliques if IC(LQ Mn ) is not order 
isomorphic under its two linear orderings (recall the definition of K.(Lq Mti ) from 
Section [7.5p . If IC(Lg m „) is order isomorphic under its two linear orderings, then 
max clique Q identifies an order -<n, namely, the order where Ql„ occurs first 

(see Section 1731 for the definition of Ql Gm )• Q defines both orders if Ql Gm is 
located in the middle. Thus, for each Q the set {~<q} is the set of orders containing 
either only one of the isomorphic orders or both. Consequently, for each set Vm,u 
there are at most three arrangement vertices of the form a {^ Q },v Mn - 

• the set S of module vertices sw a ,Vm n f° r which A is a max clique of G, and Wa 
is the vertex in Lq m n (Wa is a module of Vu,n with more than one vertex) that 
contains vertices of A, and 
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Figure 7: An interval graph and its coloured modular decomposition tree. Component 
vertices v\j are represented together with the interval graph L\j labeling them. 
The colours of module vertices are indicated in the gray fields next to them. 



• {sy}, where sy is a special vertex acting as the root of T. 

We colour the vertices in V by assigning to each vy M £ V the ordered graph K,(L,Q Mn ). The 
vertices in A remain uncoloured and may therefore be exchanged by an automorphism of 
T whenever their subtrees are isomorphic. Each sw A ,v M € S is coloured with the multiset 
of integers corresponding to the positions that the max clique A^ n takes in the orders 
of Lq m „ • The edge relation Ej- of T is now defined in a straight-forward manner, with all 
edges directed away from the root sy. 

• sy is connected to all vy M n £ V with n = \V\. 

• Each vy Mn £ V is connected to all vertices in A of the form «{^g},y Mn with Q H 
Vm,« 7^ 0- Therefore, fy Mn is connected to at most three vertices. 



Each a 



{-<q},Vm,, 



€ A is connected to all those sw A ,v Mn € 5 so that {^q} is the 



set of orders of Ly Mn under which module Wa & V(Lg M u) attains its minimal 
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position, that is, for every max clique Q that intersects with a module W of Vm,u 
with \W\ > 1, vertex a>{-i Q },v Mn G A is connected to sw Q ,v Mn G 5. 
• Every sw A ,v M G «5 is connected to those , , G V for which Vm\u' is a connected 
component of the module Wa, that is, for each max clique A the vertex sw A ,v M G <5 
is connected to i>y A , € V with n' = max{m < n | (VA,m) G P}- 
The point of the arrangement vertices A is to ensure that the order of submodules 
is properly accounted for. If our modular tree did not have such a safeguard, exchang- 
ing modules in symmetric positions might give rise to a non-isomorphic graph, but it 
would not change the tree, so T would be useless for the task of distinguishing between 
these two graphs. 

We will later need STC+C-definability of this coloured tree. Thus, notice that the tree's 
vertices are equivalence classes, which are STC+C definable. Also the edge relation and the 
colours are STC+C-definable (Lemma 17. 9|) . 

Lemma 17.121 below shows that our modular trees are a complete invariant of interval 
graphs, so modular trees can be used to tell whether two interval graphs are isomorphic. 

Lemma 7.12 Ql8j,[20j). Let G and H be interval graphs. If their modular trees are iso- 
morphic, then so are G and H. □ 

The graphs Lq m n resemble the concept of overlap components used in [TS] for the 
definition of a similar kind of modular tree. Overlap components are connected components 
of the subgraph of G in which only those edges are present for which the neighbourhood of 
neither endpoint is contained in the neighbourhood of the other (intuitively, their intervals 
overlap). It can be checked that overlap components and graphs Lq Mvl only differ in the 
way they treat vertices that are contained in just one max clique: overlap components treat 
them as further modules (which they trivially are), the Lq m graphs directly put them 
into their unambiguous places. In |18j the authors show Lemma 17.121 for this similar kind 
of modular tree. A detailed proof of Lemma 17. 121 can be found in [2D]. 

7.7. Total Preorder on Coloured Directed Trees. We can make use of the STC+C- 
definable modular decomposition tree, and define a total preorder on the vertices of Tg, that 
is, a linear order on the isomorphism classes of the (coloured) subtrees of Tq identified by 
its root vertices. 

For our purposes, we define a coloured directed tree as a tuple T = (V,E,L), where 
(V, E) is a directed tree and L C V x N(V) 2 is a relation that assigns to each vertex 
a £ V a, colour L a := {(m, n) [ (a,m,n) G L}. It is easy to bring the coloured modular 
decomposition tree into this form. For example, if a is a component vertex, say vy M n , then 
L a consists of all tuples (m, n), where (m, n) is an edge in the colour of a (i.e., an edge in 
the canon of Ly Un by which a is coloured in Tb). Furthermore, if a is a module vertex, say 
s\Va,Vm ni t nen L a consists of all tuples (m,n), where m occurs n times in the colour of a. 
In all other cases, we simply leave L a empty. 

We let <^<(x,y) be the formula such that for all coloured directed trees T, assignments 
a and a, b G V(T): 

(T, a) |= (p<i [a, b] <^=> L a is lexicographically less than or equal to Lb- 

Then defines a total preorder < on the vertices of any coloured directed tree. 

Let ip^(x,y) and (p^(x,y) be as defined in Section FOI and Section H~Tj respectively. If 
we identify each subtree of a directed tree with its root vertex, then the LREC=-formula 
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<p-<(x,y) := (p^(x,y) V <p^(x,y) defines a linear order ^ on the isomorphism classes of the 
subtrees of a directed tree. 

We define a refinement of < by letting v w whenever v < w, or: v <w and u; < v 
and v ~< w. It should be obvious how to modify the formula ip^(x,y) to an LREC=-formula 
tp^i defining . 

7.8. Canonisation. This section deals with the canonisation of interval graphs, that is, 
how to construct an LREC=-formula n'(p,q) such that for each interval graph G we have 
G = {[\V(G)\], K f [G;p, q\). As a result we obtain the following: 

Theorem 7.13. LREC = captures LOGSPACE on the class of all interval graphs. 

We use the modular decomposition tree and the total preorder on its vertices for canon- 
isation. We apply 1-recursion on the modular decomposition tree, and as we have done for 
canonising trees we build the canon from the leaves to the root of the tree. Recursively, we 
construct the canon by first building the disjoint union of the canons of the components of 
submodules, then use the arrangement vertices to insert all submodules at the correct side 
and build the canon of the corresponding component of a module. 

In the following we explain the canonisation procedure in more detail. The following 
lemma shows that it suffices to give an LREC=-formula n(p, q) such that for every interval 
graph G we have G = ([\V(G)\], k[Tg]P, <?])• It follows from Lemma 16.61 and the fact that 
the coloured modular decomposition tree of an interval graph is STC+C-definable. 

Lemma 7.14. If there exists an LREC= -formula n{p,q) such that for all interval graphs 
G we have G = ([\V(G)\], k[Tg',P,q]) and k[Tg',P, q] C [|^(G)|] 2 , then there also exists an 
LREC= -formula n'(p',q') such that for all interval graphs G, G = ([\V(G)\],k'[G;p' ,q']). 

Proof. As pointed out at the end of Section \TM the coloured modular decomposition tree of 
an interval graph G is definable in STC+C, and thus in LREC=. That is, there are LREC=- 
formulae 9y(u), 9&(u,v), 9e(u,v) and 6i,(u,q), where u,v are compatible tuples and q is a 
tuple of number variables, such that for all interval graphs G and all assignments a, 

• 9~[G,a;u,v] is an equivalence relation «, 

• 9v[G, a; u]/~ is the set of vertices of Tg, 

• 9e[G, a; u, v\/~ := {(a/~, | (a, b) G 9e[G, a; u, v]} is the edge relation of Tg, 

• and 0L[G,a;u,q]/~ is the colour-relation of the modular decomposition tree Tg- 
We now apply Lemma IfTEl with the transduction = (9v(u) , 9~(u, v) , 9e{u, v) , 9l(u, q)) to 
obtain an LREC=-formula K~®(p',q') such that for all rh,n G A r (G)'"', G (= K~ e [m,n] iff 
(m) G ,(n) G G N(G[G]) and G[G] (= /c[(m) G , (n) G ]. Note that Q[G] = Tg- As K[T G ;p,q] Q 
[\V(G)\] 2 , the condition {rh) G ,(n) G G N(Q[G}) can be replaced by (rh) G ,{n) G G N(G). 
Hence, the tuples p', q' of number variables in k can be identified with single number 
variables p',q', which yields the desired formula K,'(p',q'). □ 

In general, the canonisation procedure is similar to the one of directed trees. To apply 1- 
recursion we use a graph G = (V, E) with labels C(v) C N for all v G V. We let V := V(Tg) x 
N(Tg) 2 be the vertices of G and for all component vertices vy M n G V, (v\/ M n ,p,q) G V 
stands for u (p,q) G ^vy M where X Vv ^ is the edge relation of an isomorphic copy 

([|VAf,n|]) X VVm n ) °f Gm,h- 

In the following we explain the edge relation E and labels C of graph G. 
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Edges introduced by module vertices. 

In Tg, each vertex sw A ,v M n S 5 is connected to those vy , , G V for which Vm',u' is a 
connected component of the module Wa- Thus, we can use the available total preorder -<' 
on the children of s\y A y Mn (cf. Section 1777]) to construct the canon of the disjoint union 
of the children's canons from the canons of the children. For a vertex s G S and a child 
v := vy Mn G V of s, let D v be the set of all children v' of s with v' <' v, and e v be the 
number of children v' of s defining modules isomorphic to Vm,u (i-e., v' ^' v and v v'). 
For all p,q G N(Tg) 2 and all i G [0, e w — 1], we let a := (s,p Vj i + p,p v ,i + have an edge 
to q) where p Vii := £ v eDv \V M ',n'\ + i • \Vm,ti\ and define C(a) = {e^}. Notice that 
here we can have an in-degree greater than 1. 

Edges introduced by arrangement vertices. 

Let us consider a vertex a>{^, Q }y Mn G A. Its children in Tg are vertices sw A ,v M n for specific 
submodules of the module Vm,u, and we need to integrate the canons of them into the canon 
K,{Ly M n ) of Ly Mn . The canon 3C(Ly Mn ) is STC+C-definable (Lemma 17. 9p and we assume 
it to be assigned to the first part [1, |l/(Ly Mri )|] of the number sort. Notice that on the 
number sort we have a distinguished ordering <tv of the max cliques. 

If a {^. Q },v M „ 6 A has no sibling, then we have a distinguished order of the max cliques 
of Ly Mn , and we can integrate each canon of a submodule into K(Ly M n ) according to the 
colour of its vertex s. By integrating a submodule, we mean the following: We first sum up 
the size of K{Ly M n ) and the sizes of all submodules defined by children of CL{^ Q },v Mn with 
smaller colours, and increase each vertex of the canon of the submodule by this number. 
Further, in the canon IC(Ly M n ) we want to replace the smallest vertex z that lies in the 
max clique that is at the position defined by the colour of s and in no other max clique 
by the modified canon of the submodule. In order to do that, we add an edge between 
all vertices that are adjacent to z and all vertices of the modified canon of the submodule. 
For Ok{~iQ},v M n , we define the out-going edges of «{^ Q },y Mn in G such that, if X denotes 
the relation defined by the final LREC=-formula, the graph with edge relation {(p,q) G 
N(Tg) 2 I ((o,^ Q jy M n ,p,q),£) G X for large enough £} consists of the modified canons of 
the submodules and all new edges. Note that we have not yet removed the replaced vertices. 

If «{^q},Vm„ has siblings, a single child, and the colour of the single child contains two 
equal positions, we know we have to insert the canon of its child in the middle (regarding 
the ordering of the max cliques) of K(Ly Uri ). For such a vertex o>{^o},v Mn we construct 
the edges of G so that we obtain the following graph on the number sort: We add the size 
of K.{Ly M n ) to each vertex of the canon of the submodule, and add all edges that would be 
generated if we inserted the modified canon into the canon JC(Ly Mn ) replacing the smallest 
vertex in the middle max clique 

Now, let a^ Q y ! y Mn and o-{-<q 2 },v m n be siblings where the colour of at least one child 
contains different positions. We determine their order with respect to the total preordering. 
Say, a {<Qi} y Mn -<' a{^ Q2 },y Mi „- Then we want to integrate the submodules of a {<Qi} y Mn 
all into the first half (regarding <jy) of the max cliques of canon K,(Ly Mn ), and the submod- 
ules of «{-< Q },v Mn into the second half. Therefore, we create the edges of G in such a way 
that the graph on the number sort at vertex a {^ Ql }y M n is as follows: Each child of vertex 
a {-<Q 1 }y M n represents a certain submodule of Vm,u- We sum up the size of K,(Ly Mn ), the 
size of the submodule in the middle if it exists, and the sizes of all submodules defined by 
children of a>{-i Ql },v M „ with smaller colours, and add this value to each vertex of the canon 
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of this certain submodule. Finally, we assume we insert each of these modified canons into 
the max clique specified by the smaller value contained in the colour of the corresponding 
vertex, in the same way we did before, and add all newly created edges to the modified 
canons of the submodules. For vertex a {^ Q2 },v M „ we construct the graph on the number 
sort equivalently, only that we additionally add the sum of the sizes of all submodules 
defined by children of a {^ Ql },v M n to the vertices of the canons of the submodules. 

If a {< Ql },v M n an d a {<Q },Vm n are equivalent with respect to the total preorder, we 
insert the submodules of a {^ Q .},y M n for i = 1,2 at both sides. We position the submodules 
according to both values that are contained in their colours. Thus, if there is no submodule 
that belongs in the middle at vertex a {^ Q .},v M n , for i = 1,2, the edge relation of G almost 
enables us to define the canon of module Vm,th except that we still need to remove the 
vertices that were replaced. 

For each a G A that fits in the last case, we let C(a,p, q) = {2}, otherwise C(a,p, q) = {1}, 
for all p,q G N(Tg)- Note, that only in the last case, we obtain in-degrees larger than 1, 
that is, there the in-degree is 2. 

Edges introduced by component vertices. 

Let v = vy M G V. In the preceding step, we introduced edges for arrangement vertices 
a {~iQ},V M „ so that, if X denotes the relation defined by the final LREC=-formula in an inter- 
val graph whose coloured modular decomposition tree is Tg, the graph with edge relation 
{(p,q) G N(T G ) 2 | ((a^ Q yy Mn ,p,q),£) G X for large enough £} is almost a canon of V M , n ] 
we still need to insert JC(Ly Mn ), and remove the vertices of K(Ly Mn ) that correspond to 
the submodules of Vm,u- 

Recall from Lemma 17.91 that the canon KL{Ly M ) is STC+C-definable. The set of 
vertices of fc(Ly M n ) is [1, \V(Ly M n )|]. Let R be the set of vertices that have to be re- 
moved from K,{yM,n)i so that the resulting graph plus the edges from {(p,q) G N(Tg) 2 \ 
((a^ Q y y Mn ,p,q),£) G X for large enough £} is isomorphic to Vm,u- It is easy to define R 
by considering the different cases as we did above. 

Let f(r) := r — d r , where d r = \{s G R \ s < r}\. Then, the contracted canon 
Q := {(f(p),f(q)) | (p, q) G JC(Ly M n )} is STC+C-definable, and we assign it to the first 
part [1, \V(Ly M n )\ — \R\] of the number sort. Thus, we set C(v, f(p), f{q)) = N{Tg) for 
all (p, q) G K,(Ly Mn ). Furthermore, for each child a G A of v, we include all edges from 
(v,f(p),f(q)) to (a,p,q) for all p,q G N(T G ) \ R. Finally, for all (p,q) £ KL{Ly Mn ) we set 
C(v,f(p),f(q)) = {l}. 

Finishing the construction. 

In order to actually perform 1-recursion we need sufficient "resources". Taking a look at the 
in-degrees, we notice that they are only larger than one when we treat isomorphic connected 
components while building the disjoint union, or when the graph Vm,u is symmetric and we 
insert the submodules twice at both sides. Either way, an incoming degree of d means that 
we insert d disjoint isomorphic copies into the graph on the number sort. Hence, it suffices 
to use a binary resource term. 

Remark 7.15. It is possible to show that there is no LREC+TCfjE'lJ-sentence <p such that 
for all connected interval graphs G\ , G2 we have G\ tfcl G2 \= f if and only if G\ = G2 ■ The 
proof is based on similar ideas as the proof of Theorem 15.11 
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8. Conclusion 

We introduce the new logics LREC and LREC=, extending first-order logic with counting by 
a recursion operator that can be evaluated in logarithmic space. By capturing LOGSPACE 
on trees and interval graphs, we obtain the first nontrivial descriptive characterisations of 
LOGSPACE on natural classes of unordered structures. It would be interesting to extend 
our results to further classes of structures such as the class of planar graphs or classes of 
graphs of bounded tree width. 

The expressive power of LREC = is not yet well-understood. For example, it is an open 
question whether directed graph reachability is expressible in LREC=, and even whether 
LREC= has the same expressive power as FP+C. (Of course assumptions from complexity 
theory indicate that the answer to both questions is negative.) It is also an open question 
whether reachability on undirected trees is expressible in plain LREC. 

It is obvious that our capturing results can be transferred to nondeterministic logarith- 
mic space NL by adding a transitive closure operator to the logic. However, it would be 
much nicer to have a natural "nondeterministic" variant of our limited recursion operator 
that allows it to express directed graph reachability and thus yields a logic that contains 
TC. We leave it as an open problem to find such an operator. 
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